Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming IT Technology

(Useful) Stupid Regex Tricks? 516

careysb writes to mention that in the same vein as '*nix tricks' and 'VIM tricks', it would be nice to see one on regular expressions and the programs that use them. What amazingly cool tricks have people discovered with respect to regular expressions in everyday life as a developer or power user?"
This discussion has been archived. No new comments can be posted.

(Useful) Stupid Regex Tricks?

Comments Filter:
  • by rallymatte ( 707679 ) * on Monday November 10, 2008 @10:20AM (#25703249)
    To filter a string to make sure it's a valid ip address this regexp is quite useful.
    /^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/

    And this one for mac addresses
    /^[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}$/
  • by Poltras ( 680608 ) on Monday November 10, 2008 @10:31AM (#25703435) Homepage
    So if I get this right, 0.0.0.0 is a valid ip address? I know the real regex would take a full post, but yes, it is possible to check with a single regex is it is valid, if it makes sense (127.0.0.1, 10.*, 169.254.*, etc etc) and if it's not a broadcast or a network address (not taking netmask into account).
  • by Timmmm ( 636430 ) on Monday November 10, 2008 @10:49AM (#25703719)

    Mmmmm readable.

  • use Regex::Common; (Score:5, Insightful)

    by oneiros27 ( 46144 ) on Monday November 10, 2008 @10:54AM (#25703815) Homepage
    use Regex::Common qw(URI net);
    $text_with_urls =~ m/$RE{URI}/;
    $text_with_ips =~ m/$RE{net}{IPv4}/;
  • Re:ARGH!!!! (Score:2, Insightful)

    by Anonymous Coward on Monday November 10, 2008 @10:58AM (#25703885)

    So clearly, Slashdot's shit never stank?

    No, seriously, why the bitching? Did you expect the site to just keep reporting dry stories about incremental Linux kernel upgrades for its entire existence? You expected a website to never change and never update with the times? Just because it's old doesn't mean it's sacred.

  • by plumby ( 179557 ) on Monday November 10, 2008 @11:18AM (#25704257)

    So if I get this right, 0.0.0.0 is a valid ip address?

    If you mean "Is it an address that you can send IP traffic to?", then the answer is no. If you mean "Is it a valid value that can end up in an IP address field (e.g., in the response to the ipconfig command)?" then the answer is yes - it means that you've not got a connection.

  • by Moebius Loop ( 135536 ) on Monday November 10, 2008 @11:37AM (#25704621) Homepage

    I like stackoverflow a lot and have been tangentially involved in other tech knowledge base-type sites, but they suffer from one typical problem.

    People who already *have* certain knowledge don't often spend much time reading sites dedicated to dispensing that information.

  • by Culture20 ( 968837 ) on Monday November 10, 2008 @12:07PM (#25705235)

    (I would quote the final result but /. won't allow that many "junk" characters.. let's hope that doesn't cripple this entire discussion.)

    Interesting that a site for nerds doesn't allow a lot of characters commonly used in source code.

  • by msuarezalvarez ( 667058 ) on Monday November 10, 2008 @12:16PM (#25705405)
    You are a great candidate for the Useless Use of Cat award... specially endearing is your making a comment on the few commands your line uses :D
  • by xenocide2 ( 231786 ) on Monday November 10, 2008 @12:16PM (#25705413) Homepage

    The regex is beautiful in the sense that it lets you not be one of those assholes who refuses valid email addresses.

  • by Kent Recal ( 714863 ) on Monday November 10, 2008 @12:28PM (#25705633)

    Dear slashdot editors,

    slashdot.org is not stackoverflow.com [stackoverflow.com].
    The articles and discussions here are not searchable in a sane way. Your recent attempts to mimic stackoverflow are just a waste of everybody's time because all those little tidbits that people post get lost in the internet noise immediately.

    We know you're bit desperate [alexa.com] for traffic these days. But this is not the way to go.

  • by kimba ( 12893 ) on Monday November 10, 2008 @01:07PM (#25706411)

    Why isn't 0.0.0.0 or 10.* a valid IP address? Since when is the definition of IP address to be unicast and globally routable?

    I'd rather take issue with the fact it completely fails on IPv6 addresses.

  • by ais523 ( 1172701 ) <ais523(524\)(525)x)@bham.ac.uk> on Monday November 10, 2008 @01:11PM (#25706509)

    Does that thing allow nested comments, and escaping inside them? It doesn't look like it, it isn't recursive. (I have some in the email address I typically put online, ais523(524\)(525)x)@bham.ac.uk; that could be a good test for your email client, and is useful because I've never come across a spambot that can parse it.)

    Recent versions of Perl and Python regices allow you to write recursively; that probably qualifies as a stupid regex trick, especially as it makes them more computationally powerful so they can handle things like email addresses. Or you could just sit wondering why email addresses allow nested comments anyway...

  • by jeremyp ( 130771 ) on Monday November 10, 2008 @02:30PM (#25708027) Homepage Journal

    Your regex doesn't allow + signs in the name part.

    Nor, I would suspect would it handle quoted strings e.g. "Jeremy P"@example.com is technically a valid RFC 822 address.

    And having just looked up the RFC 5322 spec which you quote, I see there are more cases you fail to take acount of e.g.

    Jeremy P <jeremyp@example.com>

    Also, what makes you think upper case in domain names is invalid? jeremyp@example.COM fails validation.

  • Opposite (Score:4, Insightful)

    by Christopher_Olah ( 1317943 ) on Monday November 10, 2008 @07:32PM (#25713279)

    IMHO, this is exactly the way that Slashdot should be going. Threads like this are interesting, add to the reservoirs of internet knowledge, and have the highest quality to noise ratios.

    I (and I suspect many others) read Slashdot not for the latest +5 funny comment (though those can be fun to read) but to read the opinions of brilliant minds. And when those minds start trading secrets... Everyone wins.

  • by phantomfive ( 622387 ) on Monday November 10, 2008 @07:42PM (#25713397) Journal
    Yes, actually, (despite what the other posters have said), you can, but it will be very complicated since you will be implementing something like your own multiplier in regex.

    The simplest way to do it, of course, is to just list all valid Luhn Algorithm numbers. something like (.....384848583 | 938484845 | 8383838383......). Of course, this is probably not what you are looking for, because you will be listing a lot of numbers, and if your Luhn number is too big, then it won't be in your list.

    So, as for a more general solution, it is possible because at each digit you can know whether your number matches so far or not. What you will be basically implementing is a regular expression that checks each digit and says, "does this digit move me to a state that is a valid number or an invalid number?" I could be wrong, but my initial estimate is that this will take less than a thousand states in a state machine (of course, the easiest way to do this is to design a state machine and then translate it to a regular expression).

    To give an idea of what you are up against (and to help me find the answer to your question myself!) I implemented here a simple regular expression to determine if any binary addition will have an overflow at the last digit or not:

    ((0+1)+(1|(0+11)1+)+

    You can do something similar, although much much longer, with the Luhn algorithm.

    Hope that helps.

Never call a man a fool. Borrow from him.

Working...