Stories
Slash Boxes
Comments
typodupeerror delete not in

Comments: 516 +-   (Useful) Stupid Regex Tricks? on Monday November 10 2008, @09:17AM

Posted by ScuttleMonkey on Monday November 10 2008, @09:17AM
from the hope-you-like-reading-lots-of-random-characters dept.
programming
it
technology
careysb writes to mention that in the same vein as '*nix tricks' and 'VIM tricks', it would be nice to see one on regular expressions and the programs that use them. What amazingly cool tricks have people discovered with respect to regular expressions in everyday life as a developer or power user?"
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by rallymatte (707679) * on Monday November 10 2008, @09:20AM (#25703249)
    To filter a string to make sure it's a valid ip address this regexp is quite useful.
    /^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/

    And this one for mac addresses
    /^[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}$/
  • by Anonymous Coward on Monday November 10 2008, @09:22AM (#25703303)
    (Useful) Stupid * Tricks

    Yes sir, that will guarantee a front page story. You better head back to the drawing board if it doesn't fit that pattern. Next week: (Useful) Stupid Starcraft Tricks.
    • by Malevolent Tester (1201209) on Monday November 10 2008, @09:52AM (#25703785) Journal

      Next week: (Useful) Stupid Starcraft Tricks.

      You can assign a building, building add-on, or a group of up to 12 units to a single key. To do this, select what you want to assign, then hold down Control and select a number on the keyboard between 0-9. Then, when you want to select what you assigned, simply press the number of the group that you want. Pressing a group number twice will center the screen on the group.

    • by McWilde (643703) on Monday November 10 2008, @09:54AM (#25703813) Homepage

      That doesn't look right...
      Try:

      /^\(Useful\) Stupid \w+ Tricks$/

      Also, I noticed that the previous stupid tricks stories ended with a question mark, but this one doesn't. So:

      /^\(Useful\) Stupid \w+ Tricks\??$/

  • New Slashot Section (Score:5, Interesting)

    by Frankie70 (803801) on Monday November 10 2008, @09:24AM (#25703329)

    Maybe we should have a new section for "Useful Stupid Tricks" on Slashdot.

  • How about (Score:4, Funny)

    by cbiltcliffe (186293) on Monday November 10 2008, @09:30AM (#25703409) Homepage Journal

    Stupid (Useful) Ask Slashdot tricks?

    I'm not sure whether these are legitimate, or just a "I don't know what the hell I'm doing, so let's see if I can get someone else to show me how to do my job, under the guise of sharing information."

    I'd like to say the former, but my cynicism is making me lean to the latter.....

  • by mutende (13564) <klaus@seistrup.dk> on Monday November 10 2008, @09:32AM (#25703443) Homepage Journal
    Beautiful regexp that validates RFC 822 addresses: Mail-RFC822-Address.html [ex-parrot.com]
  • Windows (Score:4, Informative)

    by jgtg32a (1173373) on Monday November 10 2008, @09:32AM (#25703451)
    MS Office does support regexp while not as good as Perl regex, they are very helpful.

    Link to and excel .bas addon for regexp, which helped me alot.
    Don't forget to add the lib {tools->References->MS VBA Scrip regexp 5.5}

    http://www.tmehta.com/regexp/using_functions.htm [tmehta.com]
  • Regex Bill (Score:5, Funny)

    by Anonymous Coward on Monday November 10 2008, @09:36AM (#25703517)
    Why couldn't Bill try out his regular expressions?

    His mom wouldn't let him play with matches.
  • by Gulthek (12570) on Monday November 10 2008, @09:41AM (#25703593) Homepage Journal

    Here's a chunk of perl script I wrote (years ago) that determines if $text matches any of the styles of library call number that I've ever encountered.

    Slashcode is interestingly interpreting my formatting, but you should get the gist.


    $text =~ /
            ^[A-Z]+ # starts with at least one capital letter
            \s? # followed by an optional space
            \d+ # followed by one or more digits /x
        or $text =~ /
            ^\d+ # starts with one or more digits
            \. # followed by a single decimal /x
        or $text =~ /
            \d+ # starts with one or more digits
            \s # and a space /x
        or $text =~ /
            Thesis # starts with "Thesis" .+ # with one or more characters of any kind
            \d{4} # then four numbers - year
            \s+ # separated by at least one space
            [A-Z]+ # from one or more capital letters
            \d+ # followed by one or more numbers /xi # case ignored here in case we run into THESIS or thesis
        or $text =~ /
            \d+ # starts with one or more digits
            \- # connected with a dash
            \d+ # to one or more following digits /x
        or $text =~ /
            \d+ # starts with one or more digits
              # followed by a space
            [A-Z]* #followed by zero or more capital letters
        \d+ # followed by one or more digits /x

  • by darkvizier (703808) on Monday November 10 2008, @09:44AM (#25703649)
    I've never found regexes to be useful at all. I prefer to write my own parsers from scratch in assembly language, or conway's game of life [wikipedia.org], if I'm feeling m/(ambitious|artistic|autistic|masochistic)/.

    But even an artist gets lazy sometimes.
  • by gzipped_tar (1151931) on Monday November 10 2008, @09:53AM (#25703803) Journal
    This regex matches a number: interger or float, scientific notation or plain, plus or minus...

    [-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?
  • use Regex::Common; (Score:5, Insightful)

    by oneiros27 (46144) on Monday November 10 2008, @09:54AM (#25703815) Homepage
    use Regex::Common qw(URI net);
    $text_with_urls =~ m/$RE{URI}/;
    $text_with_ips =~ m/$RE{net}{IPv4}/;
  • To remove trailing whitespace from a textfile (vim regex, don't know if the \s will work in other regex dialects):

    /\s\+$//e

  • by [HooL] (155038) on Monday November 10 2008, @10:00AM (#25703925)

    I wonder why such FAQs are still posted on a site like Slashdot. We now have a great repository for exactly this kind of questions:
    http://stackoverflow.com/questions/tagged?tagnames=regex&sort=votes&pagesize=15 [stackoverflow.com]

  • Be lazy! (Score:5, Interesting)

    by subreality (157447) on Monday November 10 2008, @10:03AM (#25703973)

    OK, you asked for stupid tricks, but this one's just plain lazy.

    Between bash and grep, there are quite a lot of special characters that you have to escape... Or just ignore with dots!

    /I.do.this.frequently..(even.with.parenthases).,.because.sometimes.my....backslash..key.is.tired/

    A couple neat things happened: The extra dot after frequently is matching an inline paren. The paren in the PATTERN right next to it starts the mark of an atom, closed by its brother. The comma is because I put one outside the paren (here represented as the dot to the left of the comma) as is my style. Also note the literal backslash, just before you see the word backslash in hidden parenthesis.

    Why not add quotes to match the spaces easily? I get a word or two in, and I find I naturally switch to using dots. These are throwaways for single tries through grep. For production code, I hone in carefully on the parts that I'm dead sure I can anchor to, escaped by any means needed, before carefully choosing my atom to match as tightly as possible, so it'll error out if my data has gone wrong.

    Even in a simple case like this, half the fun is in explaining it. :)

  • by doti (966971) on Monday November 10 2008, @10:07AM (#25704037) Homepage


          my $re = '';
            $re = qr/
                    \{ (?:
                            (?> [^{}]+ ) # nao-chaves
                    |
                            (??{ $re }) # sub-bloco de chaves
                    )* \} /xs;

  • by ianare (1132971) on Monday November 10 2008, @10:11AM (#25704111)
    SSN
    ^(?!000)([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}$

    US phone with or without parentheses
    ^\([0-9]{3}\)\s?[0-9]{3}(-|\s)?[0-9]{4}$|^[0-9]{3}-?[0-9]{3}-?[0-9]{4}$

    ISO Date (19th to 21st century only)
    ^((18|19|20)\d\d)-(0[1-9]|1[012])-(0[1-9]|1[0-9]|2[0-9]|3[01])$
  • This regex goes through my enormous list of girlfriends' telephone numbers, and makes a selection based on the area code I'm currently in!

    #$%^&*(&^%{{}}{/\/\||```

    (No, that's not a regex at all. And no, I don't even have a single girlfriend.)

  • by ShatteredArm (1123533) on Monday November 10 2008, @12:42PM (#25707155)
    I came up with a Regex that can be used to match literally anything (yes, anything!). It is, therefore, the most flexible regex ever concocted. Here it is:

    .*
  • by IorDMUX (870522) <mark.zimmerman3@NosPAm.gmail.com> on Monday November 10 2008, @04:37PM (#25711601) Homepage

    Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

    -- Jamie Zawinski

    • by Timmmm (636430) on Monday November 10 2008, @09:49AM (#25703719)

      Mmmmm readable.

      • by Ken D (100098) on Monday November 10 2008, @11:26AM (#25705621)

        The problem is that email addresses are not suitable for regex based validation.
        There are too many legacy formats, too many variations, that are legal addresses.

        Why, back in the old days, you could send mail to things like "bob%example.com@example.org" which would shoot the email off to example.org, who's mail server would then shoot the email off to example.com. A way to hand route your email around a broken network link in the old days. Throw in a few UUCP hops, maybe getting final delivery to a BITNET connected system. Ah, those were the days!

    • by natebarney (987940) on Monday November 10 2008, @10:30AM (#25704453)

      Magic stuff like this is not working: /\([FB][ot]o\).*\1/ although that seems to be the closest description of what we wanted.

      In perl, I did /([FB][ot][o]).*\1/ and it seemed to work as you wanted. Also, if you're using a regex engine that supports lazy (non-greedy) quantifiers like perl does, I would use them in this case. It reduces backtracking. In perl, put a ? after the *.

    • by Culture20 (968837) on Monday November 10 2008, @11:07AM (#25705235)

      (I would quote the final result but /. won't allow that many "junk" characters.. let's hope that doesn't cripple this entire discussion.)

      Interesting that a site for nerds doesn't allow a lot of characters commonly used in source code.

    • by jeremyp (130771) on Monday November 10 2008, @01:30PM (#25708027) Homepage Journal

      Your regex doesn't allow + signs in the name part.

      Nor, I would suspect would it handle quoted strings e.g. "Jeremy P"@example.com is technically a valid RFC 822 address.

      And having just looked up the RFC 5322 spec which you quote, I see there are more cases you fail to take acount of e.g.

      Jeremy P <jeremyp@example.com>

      Also, what makes you think upper case in domain names is invalid? jeremyp@example.COM fails validation.

There's no such thing as a free lunch. -- Milton Friendman