(Useful) Stupid Regex Tricks?

Follow Slashdot stories on Twitter

(Useful) Stupid Regex Tricks? 516

Posted by ScuttleMonkey on Monday November 10, 2008 @10:17AM from the hope-you-like-reading-lots-of-random-characters dept.

careysb writes to mention that in the same vein as '*nix tricks' and 'VIM tricks', it would be nice to see one on regular expressions and the programs that use them. What amazingly cool tricks have people discovered with respect to regular expressions in everyday life as a developer or power user?"

This discussion has been archived. No new comments can be posted.

(Useful) Stupid Regex Tricks?

Load All Comments

Search 516 Comments Log In/Create an Account

Comments Filter:

IP and Hardware addresses (Score:5, Insightful)

by rallymatte ( 707679 ) * writes: on Monday November 10, 2008 @10:20AM (#25703249)

To filter a string to make sure it's a valid ip address this regexp is quite useful.
/^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/

And this one for mac addresses
/^[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}$/

Share
twitter facebook
- Re: (Score:3)
  
  by fbjon ( 692006 ) writes:
  
  Are IP adresses with leading zeroes usually considered invalid?
  - Re:IP and Hardware addresses (Score:4, Informative)
    
    by akozakie ( 633875 ) writes: on Monday November 10, 2008 @11:55AM (#25705005)
    
    According to the RFC leading zeros specify octal and 0x is hexadecimal. Both are standard, but rarely used and not all programs support them. There are even more ways to write an IP address, including dword and different mixes, but they are usually only used for obfuscation in malware.
    
    Parent Share
    twitter facebook
  - - Re:IP and Hardware addresses (Score:5, Informative)
      
      by fbjon ( 692006 ) writes: on Monday November 10, 2008 @11:36AM (#25704617) Homepage Journal
      
      It seems both Opera and ping in Windows interpret individual parts with leading zeros as octal. More interestingly, Opera also accepts hexadecimal. That makes constructing a regexp that validates any arbitrary IP address, and not just a valid dot-decimal, a bit more cumbersome.
      
      Parent Share
      twitter facebook
      - Re: (Score:3, Interesting)
        
        by LordKronos ( 470910 ) writes:
        
        Oh, wow, you are right. Using 0177.0.0.1 in firefox gets you to localhost, as does 0x7f.0.0.1
        Nice catch.
    - - Re:IP and Hardware addresses (Score:4, Funny)
        
        by Kymermosst ( 33885 ) writes: on Monday November 10, 2008 @05:58PM (#25711971) Journal
        
        So, would anyone like to buy my new T-shirt, it says "There is no place like 2130706433."
        
        Parent Share
        twitter facebook
- Re:IP and Hardware addresses (Score:4, Insightful)
  
  by Poltras ( 680608 ) writes: on Monday November 10, 2008 @10:31AM (#25703435) Homepage
  
  So if I get this right, 0.0.0.0 is a valid ip address? I know the real regex would take a full post, but yes, it is possible to check with a single regex is it is valid, if it makes sense (127.0.0.1, 10.*, 169.254.*, etc etc) and if it's not a broadcast or a network address (not taking netmask into account).
  
  Parent Share
  twitter facebook
  - Re:IP and Hardware addresses (Score:4, Insightful)
    
    by plumby ( 179557 ) writes: on Monday November 10, 2008 @11:18AM (#25704257)
    
    So if I get this right, 0.0.0.0 is a valid ip address?
    If you mean "Is it an address that you can send IP traffic to?", then the answer is no. If you mean "Is it a valid value that can end up in an IP address field (e.g., in the response to the ipconfig command)?" then the answer is yes - it means that you've not got a connection.
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Informative)
      
      by Arkaic ( 784460 ) writes:
      
      Also, when configuring ACLs 0.0.0.0 usually means all ip addresses.
    - Re: (Score:3, Informative)
      
      by squallbsr ( 826163 ) writes:
      
      Also, typically binding a service to ip 0.0.0.0 connects it to all available interfaces on the machine.
      
      i.e: starting a development server for a django [djangopowered.com] app on all interfaces (instead of the default 127.0.0.1)
      python manage.py -runserver 0.0.0.0:8000
  - Re:IP and Hardware addresses (Score:4, Insightful)
    
    by kimba ( 12893 ) writes: on Monday November 10, 2008 @01:07PM (#25706411)
    
    Why isn't 0.0.0.0 or 10.* a valid IP address? Since when is the definition of IP address to be unicast and globally routable?
    I'd rather take issue with the fact it completely fails on IPv6 addresses.
    
    Parent Share
    twitter facebook
- Re:IP and Hardware addresses (Score:5, Informative)
  
  by Richard_J_N ( 631241 ) writes: on Monday November 10, 2008 @10:44AM (#25703653)
  
  Of course, you can do better still. For mac addresses, try:
  ^([[:xdigit:]]{2}:){5}[[:xdigit:]]{2}$
  [:xdigit:] is short for hexadecimal digits, i.e. a-fA-F0-9
  We can also loop 5 times over the 'XX:' sections.
  
  Parent Share
  twitter facebook
  - Re:IP and Hardware addresses (Score:5, Funny)
    
    by rallymatte ( 707679 ) * writes: on Monday November 10, 2008 @10:51AM (#25703761)
    
    Not only are you showing off with a lower member id than me, do you also have to come up with a cooler regexp than me?
    
    Parent Share
    twitter facebook
    - Re:IP and Hardware addresses (Score:5, Funny)
      
      by alta ( 1263 ) writes: on Monday November 10, 2008 @11:03AM (#25703963) Homepage Journal
      
      I can easily beat you on the UID, but I couldn't regex the a out of an apple.
      
      Parent Share
      twitter facebook
      - Re:IP and Hardware addresses (Score:5, Interesting)
        
        by nschubach ( 922175 ) writes: on Monday November 10, 2008 @11:25AM (#25704367) Journal
        
        There's a really cool little "real time" regex analyzer written in Flex: (if you're not one of them scared to death by Flash content)
        http://gskinner.com/RegExr/ [gskinner.com]
        Maybe you can monkey your way into "regexing" the a out of apple :p
        
        Parent Share
        twitter facebook
        
        Re:IP and Hardware addresses (Score:5, Informative)
        
        by tamyrlin ( 51 ) writes: on Monday November 10, 2008 @03:30PM (#25709285) Homepage
        
        I personally like the regex-builder mode in Emacs as well. This one allows you to build a regexp while highlighting all matches in the current buffer.
        Of course, this should probably have been posted in the emacs thread earlier, but I think it is probably a good match for this thread as well :)
        To start it, just use M-x regexp-builder
        
        Parent Share
        twitter facebook
      - Re: (Score:3, Funny)
        
        by wertigon ( 1204486 ) writes:
        
        Write a Regex for them! :D
        
        Re: (Score:3, Funny)
        
        by glavenoid ( 636808 ) writes:
        
        No point, really. You old timers always seem to come out of the woodwork whenever low-UIDs come up in conversation :-)
        
        Re: (Score:3, Interesting)
        
        by josecanuc ( 91 ) * writes:
        
        Folks who think a low ID means a old person: get real. Slashdot hasn't been around forever. It started in 1997. Accounts were added later.
        Folks with a low ID just happened to register within the few months following the addition of accounts. Must have been 1998 or 1999. I was in college at the time. I'm currently not yet 30 years old. Is that old to you?
        
        Re: (Score:3, Funny)
        
        by jez9999 ( 618189 ) writes:
        
        You think THAT'S cool? I've seen CMDRTACO posting! Now that was a sight to behold. Actually I think I even saw a -1 once, which was his mom.
        
        Re:IP and Hardware addresses (Score:4, Funny)
        
        by Vadim Grinshpun ( 31 ) writes: on Monday November 10, 2008 @02:32PM (#25708089) Homepage
        
        Hmmm... until recently I didn't even realize that low ID's were in vogue :)
        
        Parent Share
        twitter facebook
        
        Re:IP and Hardware addresses (Score:4, Funny)
        
        by mikiN ( 75494 ) writes: on Monday November 10, 2008 @05:50PM (#25711841)
        
        You must be new h... (looks at PP's ID, gasps)
        Nevermind.
        
        Parent Share
        twitter facebook
    - Re:IP and Hardware addresses (Score:5, Funny)
      
      by sqldr ( 838964 ) writes: on Monday November 10, 2008 @11:43AM (#25704751)
      
      Not only are you showing off with a lower member id than me
      
      Low ID = old fart. He may be a regexp wizard, but he probably looks like gandalf too :-D
      
      Parent Share
      twitter facebook
- Re:IP and Hardware addresses (Score:5, Informative)
  
  by Speare ( 84249 ) writes: on Monday November 10, 2008 @10:50AM (#25703735) Homepage Journal
  
  For pretty much any useful stock problem solved by regular expressions, see Perl's Regex::Common [cpan.org] module. A lot of these patterns are fiendishly complicated to deal with edge-cases properly.
  
  Parent Share
  twitter facebook
  - - Re: (Score:3, Interesting)
      
      by LordKronos ( 470910 ) writes:
      
      Yes:
      http://search.cpan.org/~abigail/Regexp-Common-2.122/lib/Regexp/Common/profanity.pm [cpan.org]
- Re: (Score:2)
  
  by david.given ( 6740 ) writes:
  
  ^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$
  I'm not sure this is valid --- it doesn't accept non-dotted IP addresses, does it? i.e. expressing 127.0.0.1 as 2130706433. (Or 127.1, but which is equally, and surprisingly, valid.)
- Re:IP and Hardware addresses (Score:4, Informative)
  
  by Bazzargh ( 39195 ) writes: on Monday November 10, 2008 @11:15AM (#25704197)
  
  /^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/
  Try this: /^((25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])(\.|$)){4}/
  And similarly: /^(([0-9a-fA-F]{2})(:|$)){6}$/
  (term(delimiter|$)){n} is the generic stupid regex trick here. Works in perl, ymmv elsewhere.
  -Baz
  
  Parent Share
  twitter facebook
  - - Re:IP and Hardware addresses (Score:5, Funny)
      
      by L4t3r4lu5 ( 1216702 ) writes: on Monday November 10, 2008 @12:01PM (#25705133)
      
      That last bit is the perlre for a zero-width negative look-behind assertion
      It certainly looks like English, but I have no idea what that means. Whatever it is, it sure seems to help cure insomnia.
      
      Parent Share
      twitter facebook
- Re:IP and Hardware addresses (Score:4, Funny)
  
  by Dagger2 ( 1177377 ) writes: on Monday November 10, 2008 @11:53AM (#25704959)
  
  That also fails beautifully with an address like "2001:db8:3c4d:48:a00:20ff:feb9:4c54", which is perfectly valid.
  
  Unless you know you're going to be dealing with numeric IPv4 addresses in a specific format, it would be best to pass them to getaddrinfo() (with AI_NUMERICHOST if you want to avoid DNS) and let somebody else worry about validating them properly.
  
  Parent Share
  twitter facebook
- (Useful) Stupid useless articles (Score:3, Insightful)
  
  by Kent Recal ( 714863 ) writes:
  
  Dear slashdot editors,
  slashdot.org is not stackoverflow.com [stackoverflow.com].
  The articles and discussions here are not searchable in a sane way. Your recent attempts to mimic stackoverflow are just a waste of everybody's time because all those little tidbits that people post get lost in the internet noise immediately.
  We know you're bit desperate [alexa.com] for traffic these days. But this is not the way to go.
  - Opposite (Score:4, Insightful)
    
    by Christopher_Olah ( 1317943 ) writes: on Monday November 10, 2008 @07:32PM (#25713279)
    
    IMHO, this is exactly the way that Slashdot should be going. Threads like this are interesting, add to the reservoirs of internet knowledge, and have the highest quality to noise ratios.
    I (and I suspect many others) read Slashdot not for the latest +5 funny comment (though those can be fun to read) but to read the opinions of brilliant minds. And when those minds start trading secrets... Everyone wins.
    
    Parent Share
    twitter facebook
Here's One for Slashdot Stories! (Score:4, Funny)

by Anonymous Coward writes: on Monday November 10, 2008 @10:22AM (#25703303)

(Useful) Stupid * Tricks

Yes sir, that will guarantee a front page story. You better head back to the drawing board if it doesn't fit that pattern. Next week: (Useful) Stupid Starcraft Tricks.

Share
twitter facebook
- Re:Here's One for Slashdot Stories! (Score:5, Funny)
  
  by Malevolent Tester ( 1201209 ) writes: on Monday November 10, 2008 @10:52AM (#25703785) Journal
  
  Next week: (Useful) Stupid Starcraft Tricks.
  You can assign a building, building add-on, or a group of up to 12 units to a single key. To do this, select what you want to assign, then hold down Control and select a number on the keyboard between 0-9. Then, when you want to select what you assigned, simply press the number of the group that you want. Pressing a group number twice will center the screen on the group.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Funny)
    
    by Anpheus ( 908711 ) writes:
    
    Did you get that from the tips that pop up every game when you first install StarCraft?
  - Re: (Score:3, Interesting)
    
    by Eponymous Bastard ( 1143615 ) writes:
    
    On fastest:
    Stasis your oponents' fleet with your arbiter and start a 15 second countdown. On zero, your teammate nukes the stasis. Wait 30 seconds for the nuke to come down right on your open-mouthed oponents' fleet.
    To add insult to injury, if you manage to stasis both their and your ships, you can recall them out right before the nuke hits.
- Re:Here's One for Slashdot Stories! (Score:5, Funny)
  
  by McWilde ( 643703 ) writes: on Monday November 10, 2008 @10:54AM (#25703813) Homepage
  
  That doesn't look right...
  Try:
  /^$Useful$ Stupid \w+ Tricks$/
  Also, I noticed that the previous stupid tricks stories ended with a question mark, but this one doesn't. So:
  /^$Useful$ Stupid \w+ Tricks\??$/
  
  Parent Share
  twitter facebook
- Blasphemy (Score:3, Informative)
  
  by Shohat ( 959481 ) writes:
  
  There are no Stupid Starcraft Tricks.
- Re: (Score:3, Interesting)
  
  by Talderas ( 1212466 ) writes:
  
  You can permanently cloak zerg units that can burrow if you control an arbiter. By burrowing the zerg unit just as it enters the arbiter's cloaking field radius, the zerg will become permanently cloaked.
New Slashot Section (Score:5, Interesting)

by Frankie70 ( 803801 ) writes: on Monday November 10, 2008 @10:24AM (#25703329)

Maybe we should have a new section for "Useful Stupid Tricks" on Slashdot.

Share
twitter facebook
- Re: (Score:2)
  
  by VitaminB52 ( 550802 ) writes:
  Next installment: "Useful Stupid Manager Tricks"
  
  If it's understandable by a manager, than it's stupid...
  and if it gives you your much deserved salary hike, than it's useful
End it all... (Score:2, Funny)

by Notquitecajun ( 1073646 ) writes:

format c:*.*
- Re: (Score:2)
  
  by SatanicPuppy ( 611928 ) * writes:
  
  That didn't do anything...All I got was:
  tcl>format C:*.* C:*.*
ARGH!!!! (Score:2, Funny)

by soapdog ( 773638 ) writes:

You see yourself in digg.com. You are likely to be eaten by a grue.
- Re: (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  So clearly, Slashdot's shit never stank?
  No, seriously, why the bitching? Did you expect the site to just keep reporting dry stories about incremental Linux kernel upgrades for its entire existence? You expected a website to never change and never update with the times? Just because it's old doesn't mean it's sacred.
Regex Support (Score:2, Interesting)

by Extremus ( 1043274 ) writes:

I have used regex in the past, mainly for keeping long SQL scripts. The problem is the lack of full support for regex in most of editors. IMO the best (for windows, at least) is the EditPad Pro [editpadpro.com].
How about (Score:4, Funny)

by cbiltcliffe ( 186293 ) writes: on Monday November 10, 2008 @10:30AM (#25703409) Homepage Journal

Stupid (Useful) Ask Slashdot tricks?
I'm not sure whether these are legitimate, or just a "I don't know what the hell I'm doing, so let's see if I can get someone else to show me how to do my job, under the guise of sharing information."
I'd like to say the former, but my cynicism is making me lean to the latter.....

Share
twitter facebook
- Re:How about (Score:5, Interesting)
  
  by Anonymous Coward writes: on Monday November 10, 2008 @10:34AM (#25703477)
  
  I actually like these. Nice little highly enriched concentrations of geekery on a single page. Think how long it might take to round up the sort of stuff that appears here by Googling.
  Turing word: insipid
  In a sentence: You find this page insipid but I find it inspiring.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by cbiltcliffe ( 186293 ) writes:
    
    You're probably right. And there are certainly some useful nuggets on these pages, but I wouldn't be Googling for regex's anyway. That's the kind of stuff I'd write from scratch, because I want to be sure of what it does.
    Maybe I'm weird that way, but I don't often take complex programming code and just copy/paste from Google. I don't trust the Internet that much.
    Although that could be because I'm also a musician, and Googling lyrics/chords, etc for songs inevitably leads to some stuff where you think "Wa
  - Re:How about (Score:5, Interesting)
    
    by Bandman ( 86149 ) writes: <bandman.gmail@com> on Monday November 10, 2008 @11:53AM (#25704967) Homepage
    
    I like it, but I've got a bookmark folder called "Slash-doc" where I store useful threads that contain a lot of information.
    I've got a lot of threads bookmarked.
    Best Practices for Process Documentation [slashdot.org]
    How would you make a distributed Office system [slashdot.org]
    Quality Open Source / Calendar / Messaging Systems [slashdot.org]
    and some others.
    Some of the information in the threads is out of date, but the ideas are useful and interesting to read. I need to go back through Ask Slashdot and get the more recent threads that seem to act as references
    
    Parent Share
    twitter facebook
- news for nerds. NERDS (Score:3, Informative)
  
  by circletimessquare ( 444983 ) writes:
  
  stuff that matters
  understand the concept?
  if not, try going to this site [tmz.com], it looks like it might be more your speed
  buhbye
Regexp-based address validation (Score:5, Informative)

by mutende ( 13564 ) writes: <klaus@seistrup.dk> on Monday November 10, 2008 @10:32AM (#25703443) Homepage Journal

Beautiful regexp that validates RFC 822 addresses: Mail-RFC822-Address.html [ex-parrot.com]

Share
twitter facebook
- Re:Regexp-based address validation (Score:5, Funny)
  
  by neoform ( 551705 ) writes: <djneoform@gmail.com> on Monday November 10, 2008 @11:07AM (#25704033) Homepage
  
  Best part of that Regex? It's easy to modify too!
  
  Parent Share
  twitter facebook
- - Re: (Score:2, Informative)
    
    by Daas ( 620469 ) writes:
    
    I matches the entire RFC, not just the you@slashie.com .
    <You @ Slashie> you@slashie.com
    Should be valid if I remember correctly.
- - Re:Regexp-based address validation (Score:5, Insightful)
    
    by xenocide2 ( 231786 ) writes: on Monday November 10, 2008 @12:16PM (#25705413) Homepage
    
    The regex is beautiful in the sense that it lets you not be one of those assholes who refuses valid email addresses.
    
    Parent Share
    twitter facebook
Mainframe Formatting (Score:2)

by jchawk ( 127686 ) writes:

I use this to remove formatting that is included in the reports spit out from the mainframe -
cat REPORT_NAME | sed 's/[^a-z0-9,.-]//gi' > REPORT.out
It uses a few commands to accomplish this but I figured I would include the entire command line for completeness. It keeps all letters, numbers, ',', '.', and '-'. If you need other characters you can always add them to the regular expression.
- Re:Mainframe Formatting (Score:4, Insightful)
  
  by msuarezalvarez ( 667058 ) writes: on Monday November 10, 2008 @12:16PM (#25705405)
  
  You are a great candidate for the Useless Use of Cat award... specially endearing is your making a comment on the few commands your line uses :D
  
  Parent Share
  twitter facebook
Windows (Score:4, Informative)

by jgtg32a ( 1173373 ) writes: on Monday November 10, 2008 @10:32AM (#25703451)

MS Office does support regexp while not as good as Perl regex, they are very helpful.

Link to and excel .bas addon for regexp, which helped me alot.
Don't forget to add the lib {tools->References->MS VBA Scrip regexp 5.5}

http://www.tmehta.com/regexp/using_functions.htm [tmehta.com]

Share
twitter facebook
is it an rfc-822 compliant e-mail address? (Score:3, Interesting)

by Anonymous Coward writes: on Monday November 10, 2008 @10:32AM (#25703455)

please validate using the rfc and not your sketchy interpretation of an e-mail address. /.*@.*\..*/ will not cut it.
Try instead
([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x22([^\\x0d\\x22\\x5c\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x22)(\\x2e([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x22([^\\x0d\\x22\\x5c\\x80-\\xff]|\\x5c\\x00-\\x7f)*\\x22))*\\x40([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x5b([^\\x0d\\x5b-\\x5d\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x5d)(\\x2e([^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+|\\x5b([^\\x0d\\x5b-\\x5d\\x80-\\xff]|\\x5c[\\x00-\\x7f])*\\x5d))*
See the original at http://www.iamcal.com/publish/articles/php/parsing_email/

Share
twitter facebook
- Re:is it an rfc-822 compliant e-mail address? (Score:4, Insightful)
  
  by Timmmm ( 636430 ) writes: on Monday November 10, 2008 @10:49AM (#25703719)
  
  Mmmmm readable.
  
  Parent Share
  twitter facebook
- Re: (Score:3, Insightful)
  
  by ais523 ( 1172701 ) writes:
  
  Does that thing allow nested comments, and escaping inside them? It doesn't look like it, it isn't recursive. (I have some in the email address I typically put online, ais523(524\)(525)x)@bham.ac.uk; that could be a good test for your email client, and is useful because I've never come across a spambot that can parse it.)
  Recent versions of Perl and Python regices allow you to write recursively; that probably qualifies as a stupid regex trick, especially as it makes them more computationally powerful so t
- - Re:is it an rfc-822 compliant e-mail address? (Score:4, Interesting)
    
    by Ken D ( 100098 ) writes: on Monday November 10, 2008 @12:26PM (#25705621)
    
    The problem is that email addresses are not suitable for regex based validation.
    There are too many legacy formats, too many variations, that are legal addresses.
    Why, back in the old days, you could send mail to things like "bob%example.com@example.org" which would shoot the email off to example.org, who's mail server would then shoot the email off to example.com. A way to hand route your email around a broken network link in the old days. Throw in a few UUCP hops, maybe getting final delivery to a BITNET connected system. Ah, those were the days!
    
    Parent Share
    twitter facebook
99 Bottles of Beer on the wall (Score:3, Interesting)

by Pahalial ( 580781 ) writes: on Monday November 10, 2008 @10:35AM (#25703501)

Saw this one recently, by Andrew Savige. He did use a Perl module to generate the regex itself, but even so!

http://search.cpan.org/dist/Acme-EyeDrops/lib/Acme/EyeDrops.pm#99_Bottles_of_Beer [cpan.org]

(I would quote the final result but /. won't allow that many "junk" characters.. let's hope that doesn't cripple this entire discussion.)

Share
twitter facebook
- Re:99 Bottles of Beer on the wall (Score:5, Insightful)
  
  by Culture20 ( 968837 ) writes: on Monday November 10, 2008 @12:07PM (#25705235)
  
  (I would quote the final result but /. won't allow that many "junk" characters.. let's hope that doesn't cripple this entire discussion.)
  Interesting that a site for nerds doesn't allow a lot of characters commonly used in source code.
  
  Parent Share
  twitter facebook
Regex Bill (Score:5, Funny)

by Anonymous Coward writes: on Monday November 10, 2008 @10:36AM (#25703517)

Why couldn't Bill try out his regular expressions?

His mom wouldn't let him play with matches.

Share
twitter facebook
PCRE and perl 5.10 offer "tagged" captures (Score:2)

by BrianRoach ( 614397 ) writes:

(?:<thing>foo) Where you can then access the matched substring ("foo" in this case) by the tag/label "thing" (access syntax depends on language). It's pretty spiffy if you need order independent matching.
Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
- Re: (Score:2)
  
  by Goaway ( 82658 ) writes:
  
  Nope, still here.
  - Re: (Score:2)
    
    by Goaway ( 82658 ) writes:
    
    PS: I totally messed that up Should've been "Nope, still here".
Match a library call number (Score:5, Interesting)

by Gulthek ( 12570 ) writes: on Monday November 10, 2008 @10:41AM (#25703593) Homepage Journal

Here's a chunk of perl script I wrote (years ago) that determines if $text matches any of the styles of library call number that I've ever encountered.
Slashcode is interestingly interpreting my formatting, but you should get the gist.
$text =~ / ^[A-Z]+ # starts with at least one capital letter \s? # followed by an optional space \d+ # followed by one or more digits /x or $text =~ / ^\d+ # starts with one or more digits \. # followed by a single decimal /x or $text =~ / \d+ # starts with one or more digits \s # and a space /x or $text =~ / Thesis # starts with "Thesis" .+ # with one or more characters of any kind \d{4} # then four numbers - year \s+ # separated by at least one space [A-Z]+ # from one or more capital letters \d+ # followed by one or more numbers /xi # case ignored here in case we run into THESIS or thesis or $text =~ / \d+ # starts with one or more digits \- # connected with a dash \d+ # to one or more following digits /x or $text =~ / \d+ # starts with one or more digits # followed by a space [A-Z]* #followed by zero or more capital letters \d+ # followed by one or more digits /x

Share
twitter facebook
- Re:Match a library call number (Score:5, Funny)
  
  by mgbastard ( 612419 ) writes: on Monday November 10, 2008 @12:06PM (#25705205)
  
  holy crap. You DOCUMENTED your regular expression? You shall be thrown into the pit!
  
  Parent Share
  twitter facebook
Re: (Score:2)

by account_deleted ( 4530225 ) writes:

Comment removed based on user account deletion
Nope, not useful (Score:5, Funny)

by darkvizier ( 703808 ) writes: on Monday November 10, 2008 @10:44AM (#25703649)

I've never found regexes to be useful at all. I prefer to write my own parsers from scratch in assembly language, or conway's game of life [wikipedia.org], if I'm feeling m/(ambitious|artistic|autistic|masochistic)/.

But even an artist gets lazy sometimes.

Share
twitter facebook
CWEB and Doxygen (Score:2)

by N3Roaster ( 888781 ) writes:

Here's one I came up with recently:
If you want to get documentation out of both CWEB and Doxygen, write the Doxygen comments in the source files like @=//! Comment for Doxygen.@> to prevent ctangle from stripping the comment out, then use sed 's/@=\/.*@>//g' input.w > output.w to strip those comments out so they don't end up in the output from cweave.
One regex to match them all (Score:5, Informative)

by gzipped_tar ( 1151931 ) writes: on Monday November 10, 2008 @10:53AM (#25703803) Journal

This regex matches a number: interger or float, scientific notation or plain, plus or minus...

[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?

Share
twitter facebook
use Regex::Common; (Score:5, Insightful)

by oneiros27 ( 46144 ) writes: on Monday November 10, 2008 @10:54AM (#25703815) Homepage

use Regex::Common qw(URI net); $text_with_urls =~ m/$RE{URI}/; $text_with_ips =~ m/$RE{net}{IPv4}/;

Share
twitter facebook
Remove trailing whitespace (Score:4, Interesting)

by cerberusss ( 660701 ) writes: on Monday November 10, 2008 @10:56AM (#25703851) Journal

To remove trailing whitespace from a textfile (vim regex, don't know if the \s will work in other regex dialects):
/\s\+$//e

Share
twitter facebook
Do these questions really belong here? (Score:5, Informative)

by DerCed ( 155038 ) writes: on Monday November 10, 2008 @11:00AM (#25703925)

I wonder why such FAQs are still posted on a site like Slashdot. We now have a great repository for exactly this kind of questions:
http://stackoverflow.com/questions/tagged?tagnames=regex&sort=votes&pagesize=15 [stackoverflow.com]

Share
twitter facebook
Not very complex, but ... (Score:2)

by Bob-taro ( 996889 ) writes:

I often use sed to split a delimited line into multiple lines. E.g.:
echo $PATH | sed 's/:/\ /g'
RFC 822 email validation (Score:2, Informative)

by gpuk ( 712102 ) writes:

Cal Henderson's routine is the best RFC compliant regex I have ever found to verify an email address:
http://code.iamcal.com/php/rfc822/ [iamcal.com]
Be lazy! (Score:5, Interesting)

by subreality ( 157447 ) writes: on Monday November 10, 2008 @11:03AM (#25703973)

OK, you asked for stupid tricks, but this one's just plain lazy.
Between bash and grep, there are quite a lot of special characters that you have to escape... Or just ignore with dots!
/I.do.this.frequently..(even.with.parenthases).,.because.sometimes.my....backslash..key.is.tired/
A couple neat things happened: The extra dot after frequently is matching an inline paren. The paren in the PATTERN right next to it starts the mark of an atom, closed by its brother. The comma is because I put one outside the paren (here represented as the dot to the left of the comma) as is my style. Also note the literal backslash, just before you see the word backslash in hidden parenthesis.
Why not add quotes to match the spaces easily? I get a word or two in, and I find I naturally switch to using dots. These are throwaways for single tries through grep. For production code, I hone in carefully on the parts that I'm dead sure I can anchor to, escaped by any means needed, before carefully choosing my atom to match as tightly as possible, so it'll error out if my data has gone wrong.
Even in a simple case like this, half the fun is in explaining it. :)

Share
twitter facebook
recursive regexp to match {} block (Score:4, Informative)

by doti ( 966971 ) writes: on Monday November 10, 2008 @11:07AM (#25704037) Homepage

my $re = ''; $re = qr/ \{ (?: (?> [^{}]+ ) # nao-chaves | (??{ $re }) # sub-bloco de chaves )* \} /xs;

Share
twitter facebook
email validation... (Score:2, Interesting)

by Ramley ( 1168049 ) writes:

This was always useful when appropriate: /^[\w.|-]+@(?:[\w.|-]{2,63}\.)+[a-z]{2,6}$/ Validates a valid email address (rfc 5322) -- although not taking into account an IP address (user@192.168.1.2)
- Re:email validation... FAIL (Score:4, Insightful)
  
  by jeremyp ( 130771 ) writes: on Monday November 10, 2008 @02:30PM (#25708027) Homepage Journal
  
  Your regex doesn't allow + signs in the name part.
  Nor, I would suspect would it handle quoted strings e.g. "Jeremy P"@example.com is technically a valid RFC 822 address.
  And having just looked up the RFC 5322 spec which you quote, I see there are more cases you fail to take acount of e.g.
  Jeremy P <jeremyp@example.com>
  Also, what makes you think upper case in domain names is invalid? jeremyp@example.COM fails validation.
  
  Parent Share
  twitter facebook
some that I've used ... (Score:5, Interesting)

by ianare ( 1132971 ) writes: on Monday November 10, 2008 @11:11AM (#25704111)

SSN
^(?!000)([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}$

US phone with or without parentheses
^$[0-9]{3}$\s?[0-9]{3}(-|\s)?[0-9]{4}$|^[0-9]{3}-?[0-9]{3}-?[0-9]{4}$

ISO Date (19th to 21st century only)
^((18|19|20)\d\d)-(0[1-9]|1[012])-(0[1-9]|1[0-9]|2[0-9]|3[01])$

Share
twitter facebook
- Re: (Score:3, Funny)
  
  by jonaskoelker ( 922170 ) writes:
  
  ISO Date (19th to 21st century only)
  ^((18|19|20)\d\d)-(0[1-9]|1[012])-(0[1-9]|1[0-9]|2[0-9]|3[01])$
  
  This regexp is ISO certified. The certificate is valid until 2009-02-31.
Not a trick, but a question. (Score:3, Interesting)

by Janek Kozicki ( 722688 ) writes: on Monday November 10, 2008 @11:13AM (#25704167) Journal

I was wondering with my friend someday if it's possible with regex to select a pattern which occurs twice or more times repeatedly in single line but is separated by undefined characters. For example I want to select only lines in which the same pattern "[FB][ot]o" occurs exactly two times (in example below . is any character, for clarity):

...Foo... - is not selected
...Foo...Bto... - is not selected
...Bto...Bto... - is selected
a normal /[FB][ot]o.*[FB][ot]o/ would select the second and third case. But I only want the third case. The first occurrence would define my pattern, and second occurrence must exactly match it. Magic stuff like this is not working: /$[FB][ot]o$.*\1/ although that seems to be the closest description of what we wanted.

Share
twitter facebook
- Re:Not a trick, but a question. (Score:4, Informative)
  
  by natebarney ( 987940 ) writes: on Monday November 10, 2008 @11:30AM (#25704453)
  
  Magic stuff like this is not working: /$[FB][ot]o$.*\1/ although that seems to be the closest description of what we wanted.
  In perl, I did /([FB][ot][o]).*\1/ and it seemed to work as you wanted. Also, if you're using a regex engine that supports lazy (non-greedy) quantifiers like perl does, I would use them in this case. It reduces backtracking. In perl, put a ? after the *.
  
  Parent Share
  twitter facebook
- Re: (Score:3, Informative)
  
  by Moebius Loop ( 135536 ) writes:
  
  In most regex engines, you should be able to do this with backreferences. I don't use them often, but I think something like this would work:
  /^(.*?)([FB][ot]o)((.+?)\2)+(.*?)$/
  I think the reason the example you gave using \1 didn't work is because the .* was too greedy, and ate up the rest of the pattern before the \1 got a chance to match. Also, when you're doing full line matching, it's always good to think about ^/$ and whether you're using any multiline modifiers.
Handy links (Score:3, Informative)

by Kozz ( 7764 ) writes: on Monday November 10, 2008 @11:14AM (#25704185)

While I'm not providing any specific trick per say, on topic are a few useful links:
http://www.regular-expressions.info/ [regular-expressions.info] - this one is handy for regex info particularly in Javascript which I use so infrequently I need to know how to match, capture, substitute, etc.
http://perldoc.perl.org/perlre.html [perl.org] - plenty of regex info there which is Perl specific, but of course extends to many other similar implementations

Share
twitter facebook
- Re: (Score:3, Informative)
  
  by Pope ( 17780 ) writes:
  
  I was recently trying to come up with a regex for some renaming file thingy recently, and I found I could easily state is pseudo-code what I wanted to do, but looking through and regex sites/tips/FAQs quickly went from "here's a very simple match test" to "going to the moon", with little in-between, which is what I was after.
  However, I eventually found Reggy [apple.com] for OS X, a handy little tool for testing regexes, so all was not lost.
Validating credit card numbers (Score:3, Interesting)

by hansamurai ( 907719 ) writes: <hansamurai@gmail.com> on Monday November 10, 2008 @11:47AM (#25704857) Homepage Journal

Does anyone know if the Luhn Algorithm can be implemented in regex only?
http://en.wikipedia.org/wiki/Luhn_algorithm [wikipedia.org]
(sorry if I double post this... I swear I posted it 10 minutes ago)

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by phantomfive ( 622387 ) writes:
  
  Yes, actually, (despite what the other posters have said), you can, but it will be very complicated since you will be implementing something like your own multiplier in regex.
  
  The simplest way to do it, of course, is to just list all valid Luhn Algorithm numbers. something like (.....384848583 | 938484845 | 8383838383......). Of course, this is probably not what you are looking for, because you will be listing a lot of numbers, and if your Luhn number is too big, then it won't be in your list.
  
  So, as fo
Search through phone numbers (Score:5, Funny)

by cerberusss ( 660701 ) writes: on Monday November 10, 2008 @11:56AM (#25705021) Journal

This regex goes through my enormous list of girlfriends' telephone numbers, and makes a selection based on the area code I'm currently in!
#$%^&*(&^%{{}}{/\/\||```
(No, that's not a regex at all. And no, I don't even have a single girlfriend.)

Share
twitter facebook
Useful parsing configs in bash (Score:3, Interesting)

by Bandman ( 86149 ) writes: <bandman.gmail@com> on Monday November 10, 2008 @12:02PM (#25705147) Homepage

I have to parse files with bash sometimes, and I use these:
^# = line with a leading comment
^$ = empty line
They're simple, but work usually. You can make them a lot more bullet proof by adding in blank checking between the characters, but it seems to work.
cat httpd.conf | grep -v \^\# | grep -v \^\$ | less
makes httpd.conf a lot more readable.

Share
twitter facebook
OK, I'll play... (Score:3, Informative)

by PRMan ( 959735 ) writes: on Monday November 10, 2008 @12:14PM (#25705369)

Bad filename character for Windows (if it matches, the filename is invalid):
[*<>=+"\\/,.:;]

E-mail (use case insensitive):
^\s*[\w-~&$+']+(\.[\w-]+)*@(?<domain>[\w-]+\.)+(?<tld>[0-9]{1,3}|aero|arpa|biz|com|coop|edu|gov|info|int|museum|net|org|[a-z]{2})\s*$

GUID (use case insensitive):
^\{?[0-9a-f]{8}-?([0-9a-f]{4}-?){3}[0-9a-f]{12}\}?$

IP on local private network:
^127\.|^10\.|^192\.168\.|^172\.1[6-9]\.|^172.2\d\.|^172.3[01]\.|^169\.254

Removes .NET named capture syntax so that a .NET Regex string can be used elsewhere (such as Javascript) (replace with nothing):
\?\<\w+\>

Flame away about how horrible it is that I missed some edge case that even nobody on Slashdot has ever heard of, but they work well for me and hopefully for you too.
Now, if you actually find a common case that I missed, I would appreciate the help...

Share
twitter facebook
valid utf-8 string (Score:3, Interesting)

by Danny Rathjens ( 8471 ) writes: <slashdot2.rathjens@org> on Monday November 10, 2008 @12:26PM (#25705611)

Here is the crazy regex to detect a valid UTF-8 string. :) /^( [\x09\x0A\x0D\x20-\x7E] # ASCII | [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte | \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte | \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates | \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3 | [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15 | \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16 )*$/x This can crash perl if the string being checked is too big. :D So it's usually better to just let perl attempt to decode anything non-ascii as utf8 and see if it fails or not. (And hope all the utf8 parsing exploits have been fixed :) eval { $param = decode( 'utf8', $param, Encode::FB_CROAK) if $param =~ /[^\x00-\x7E]/ }; $param = decode( 'iso-8859-1', $param, Encode::FB_CROAK) if $@; # utf8 decode of non-ascii text failed so treat as latin1

Share
twitter facebook
The most useful regex there is! (Score:5, Funny)

by ShatteredArm ( 1123533 ) writes: on Monday November 10, 2008 @01:42PM (#25707155)

I came up with a Regex that can be used to match literally anything (yes, anything!). It is, therefore, the most flexible regex ever concocted. Here it is:

.*

Share
twitter facebook
- Re: (Score:3, Funny)
  
  by Lord Ender ( 156273 ) writes:
  
  I was able to simplify your regex somewhat so that it still matches everything, but takes up half the space:
  .
It must be said (Score:4, Funny)

by IorDMUX ( 870522 ) writes: <<moc.liamg> <ta> <3namremmiz.kram>> on Monday November 10, 2008 @05:37PM (#25711601) Homepage

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
-- Jamie Zawinski

Share
twitter facebook
- Just read this (Score:2, Informative)
  
  by Anonymous Coward writes:
  
  on the daily WTF: http://thedailywtf.com/Articles/Now-I-Have-Two-Hundred-Problems.aspx [thedailywtf.com] enjoy!

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

IP and Hardware addresses (Score:5, Insightful)

Re: (Score:3)

Re:IP and Hardware addresses (Score:4, Informative)

Re:IP and Hardware addresses (Score:5, Informative)

Re: (Score:3, Interesting)

Re:IP and Hardware addresses (Score:4, Funny)

Re:IP and Hardware addresses (Score:4, Insightful)

Re:IP and Hardware addresses (Score:4, Insightful)

Re: (Score:3, Informative)

Re: (Score:3, Informative)

Re:IP and Hardware addresses (Score:4, Insightful)

Re:IP and Hardware addresses (Score:5, Informative)

Re:IP and Hardware addresses (Score:5, Funny)

Re:IP and Hardware addresses (Score:5, Funny)

Re:IP and Hardware addresses (Score:5, Interesting)

Re:IP and Hardware addresses (Score:5, Informative)

Re: (Score:3, Funny)

Re: (Score:3, Funny)

Re: (Score:3, Interesting)

Re: (Score:3, Funny)

Re:IP and Hardware addresses (Score:4, Funny)

Re:IP and Hardware addresses (Score:4, Funny)

Re:IP and Hardware addresses (Score:5, Funny)

Re:IP and Hardware addresses (Score:5, Informative)

Re: (Score:3, Interesting)

Re: (Score:2)

Re:IP and Hardware addresses (Score:4, Informative)

Re:IP and Hardware addresses (Score:5, Funny)

Re:IP and Hardware addresses (Score:4, Funny)

(Useful) Stupid useless articles (Score:3, Insightful)

Opposite (Score:4, Insightful)

Here's One for Slashdot Stories! (Score:4, Funny)

Re:Here's One for Slashdot Stories! (Score:5, Funny)

Re: (Score:3, Funny)

Re: (Score:3, Interesting)

Re:Here's One for Slashdot Stories! (Score:5, Funny)

Blasphemy (Score:3, Informative)

Re: (Score:3, Interesting)

New Slashot Section (Score:5, Interesting)

Re: (Score:2)

End it all... (Score:2, Funny)

Re: (Score:2)

ARGH!!!! (Score:2, Funny)

Re: (Score:2, Insightful)

Regex Support (Score:2, Interesting)

How about (Score:4, Funny)

Re:How about (Score:5, Interesting)

Re: (Score:2)

Re:How about (Score:5, Interesting)

news for nerds. NERDS (Score:3, Informative)

Regexp-based address validation (Score:5, Informative)

Re:Regexp-based address validation (Score:5, Funny)

Re: (Score:2, Informative)

Re:Regexp-based address validation (Score:5, Insightful)

Mainframe Formatting (Score:2)

Re:Mainframe Formatting (Score:4, Insightful)

Windows (Score:4, Informative)

is it an rfc-822 compliant e-mail address? (Score:3, Interesting)

Re:is it an rfc-822 compliant e-mail address? (Score:4, Insightful)

Re: (Score:3, Insightful)

Re:is it an rfc-822 compliant e-mail address? (Score:4, Interesting)

99 Bottles of Beer on the wall (Score:3, Interesting)

Re:99 Bottles of Beer on the wall (Score:5, Insightful)

Regex Bill (Score:5, Funny)

PCRE and perl 5.10 offer "tagged" captures (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Match a library call number (Score:5, Interesting)

Re:Match a library call number (Score:5, Funny)

Re: (Score:2)

Nope, not useful (Score:5, Funny)

CWEB and Doxygen (Score:2)

One regex to match them all (Score:5, Informative)

use Regex::Common; (Score:5, Insightful)

Remove trailing whitespace (Score:4, Interesting)

Do these questions really belong here? (Score:5, Informative)

Not very complex, but ... (Score:2)

RFC 822 email validation (Score:2, Informative)

Be lazy! (Score:5, Interesting)