Slashdot Log In
(Useful) Stupid Regex Tricks?
Posted by
ScuttleMonkey
on Mon Nov 10, 2008 09:17 AM
from the hope-you-like-reading-lots-of-random-characters dept.
from the hope-you-like-reading-lots-of-random-characters dept.
careysb writes to mention that in the same vein as '*nix tricks' and 'VIM tricks', it would be nice to see one on regular expressions and the programs that use them. What amazingly cool tricks have people discovered with respect to regular expressions in everyday life as a developer or power user?"
Related Stories
[+]
(Useful) Stupid Unix Tricks? 2362 comments
So the other day I messaged another admin from the console using the regular old 'write' command (as I've been doing for over 10 years). To my surprise he didn't know how to respond back to me (he had to call me on the phone) and had never even known you could do that. That got me thinking that there's probably lots of things like that, and likely things I've never heard of. What sorts of things do you take for granted as a natural part of Unix that other people are surprised at?
[+]
(Useful) Stupid Vim Tricks? 702 comments
haroldag writes "I thoroughly enjoyed the recent post about Unix tricks, so I ask Slashdot vim users, what's out there? :Sex, :b#, marks, ctags. Any tricks worth sharing?"
[+]
(Useful) Stupid BlackBerry Tricks? 238 comments
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
IP and Hardware addresses (Score:5, Insightful)
And this one for mac addresses
Re:IP and Hardware addresses (Score:4, Insightful)
Parent
Re:IP and Hardware addresses (Score:4, Insightful)
If you mean "Is it an address that you can send IP traffic to?", then the answer is no. If you mean "Is it a valid value that can end up in an IP address field (e.g., in the response to the ipconfig command)?" then the answer is yes - it means that you've not got a connection.
Parent
Re:IP and Hardware addresses (Score:4, Insightful)
Why isn't 0.0.0.0 or 10.* a valid IP address? Since when is the definition of IP address to be unicast and globally routable?
I'd rather take issue with the fact it completely fails on IPv6 addresses.
Parent
Re:IP and Hardware addresses (Score:5, Informative)
Of course, you can do better still. For mac addresses, try:
^([[:xdigit:]]{2}:){5}[[:xdigit:]]{2}$
[:xdigit:] is short for hexadecimal digits, i.e. a-fA-F0-9
We can also loop 5 times over the 'XX:' sections.
Parent
Re:IP and Hardware addresses (Score:5, Funny)
Parent
Re:IP and Hardware addresses (Score:5, Funny)
Parent
Re:IP and Hardware addresses (Score:5, Interesting)
There's a really cool little "real time" regex analyzer written in Flex: (if you're not one of them scared to death by Flash content)
http://gskinner.com/RegExr/ [gskinner.com]
Maybe you can monkey your way into "regexing" the a out of apple :p
Parent
Re:IP and Hardware addresses (Score:5, Informative)
I personally like the regex-builder mode in Emacs as well. This one allows you to build a regexp while highlighting all matches in the current buffer.
Of course, this should probably have been posted in the emacs thread earlier, but I think it is probably a good match for this thread as well :)
To start it, just use M-x regexp-builder
Parent
Re:IP and Hardware addresses (Score:4, Funny)
Hmmm... until recently I didn't even realize that low ID's were in vogue :)
Parent
Re:IP and Hardware addresses (Score:4, Funny)
You must be new h... (looks at PP's ID, gasps)
Nevermind.
Parent
Re:IP and Hardware addresses (Score:5, Funny)
Low ID = old fart. He may be a regexp wizard, but he probably looks like gandalf too
Parent
Re:IP and Hardware addresses (Score:5, Informative)
For pretty much any useful stock problem solved by regular expressions, see Perl's Regex::Common [cpan.org] module. A lot of these patterns are fiendishly complicated to deal with edge-cases properly.
Parent
Re:IP and Hardware addresses (Score:4, Informative)
/^((25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9][0-9]|[0-9])$/
Try this: /^((25[0-5]|(2[0-4]|1[0-9]|[1-9]|)[0-9])(\.|$)){4}/
And similarly: /^(([0-9a-fA-F]{2})(:|$)){6}$/
(term(delimiter|$)){n} is the generic stupid regex trick here. Works in perl, ymmv elsewhere.
-Baz
Parent
Re:IP and Hardware addresses (Score:5, Funny)
That last bit is the perlre for a zero-width negative look-behind assertion
It certainly looks like English, but I have no idea what that means. Whatever it is, it sure seems to help cure insomnia.
Parent
Re:IP and Hardware addresses (Score:4, Funny)
Unless you know you're going to be dealing with numeric IPv4 addresses in a specific format, it would be best to pass them to getaddrinfo() (with AI_NUMERICHOST if you want to avoid DNS) and let somebody else worry about validating them properly.
Parent
Re:IP and Hardware addresses (Score:4, Informative)
According to the RFC leading zeros specify octal and 0x is hexadecimal. Both are standard, but rarely used and not all programs support them. There are even more ways to write an IP address, including dword and different mixes, but they are usually only used for obfuscation in malware.
Parent
Re:IP and Hardware addresses (Score:5, Informative)
Parent
Re:IP and Hardware addresses (Score:4, Funny)
So, would anyone like to buy my new T-shirt, it says "There is no place like 2130706433."
Parent
Here's One for Slashdot Stories! (Score:4, Funny)
Yes sir, that will guarantee a front page story. You better head back to the drawing board if it doesn't fit that pattern. Next week: (Useful) Stupid Starcraft Tricks.
Re:Here's One for Slashdot Stories! (Score:5, Funny)
Next week: (Useful) Stupid Starcraft Tricks.
You can assign a building, building add-on, or a group of up to 12 units to a single key. To do this, select what you want to assign, then hold down Control and select a number on the keyboard between 0-9. Then, when you want to select what you assigned, simply press the number of the group that you want. Pressing a group number twice will center the screen on the group.
Parent
Re:Here's One for Slashdot Stories! (Score:5, Funny)
That doesn't look right...
Try:
Also, I noticed that the previous stupid tricks stories ended with a question mark, but this one doesn't. So:
Parent
New Slashot Section (Score:5, Interesting)
Maybe we should have a new section for "Useful Stupid Tricks" on Slashdot.
How about (Score:4, Funny)
Stupid (Useful) Ask Slashdot tricks?
I'm not sure whether these are legitimate, or just a "I don't know what the hell I'm doing, so let's see if I can get someone else to show me how to do my job, under the guise of sharing information."
I'd like to say the former, but my cynicism is making me lean to the latter.....
Re:How about (Score:5, Interesting)
I actually like these. Nice little highly enriched concentrations of geekery on a single page. Think how long it might take to round up the sort of stuff that appears here by Googling.
Turing word: insipid
In a sentence: You find this page insipid but I find it inspiring.
Parent
Re:How about (Score:5, Interesting)
I like it, but I've got a bookmark folder called "Slash-doc" where I store useful threads that contain a lot of information.
I've got a lot of threads bookmarked.
Best Practices for Process Documentation [slashdot.org]
How would you make a distributed Office system [slashdot.org]
Quality Open Source / Calendar / Messaging Systems [slashdot.org]
and some others.
Some of the information in the threads is out of date, but the ideas are useful and interesting to read. I need to go back through Ask Slashdot and get the more recent threads that seem to act as references
Parent
Regexp-based address validation (Score:5, Informative)
Re:Regexp-based address validation (Score:5, Funny)
Best part of that Regex? It's easy to modify too!
Parent
Re:Regexp-based address validation (Score:5, Insightful)
The regex is beautiful in the sense that it lets you not be one of those assholes who refuses valid email addresses.
Parent
Windows (Score:4, Informative)
Link to and excel
Don't forget to add the lib {tools->References->MS VBA Scrip regexp 5.5}
http://www.tmehta.com/regexp/using_functions.htm [tmehta.com]
Regex Bill (Score:5, Funny)
His mom wouldn't let him play with matches.
Match a library call number (Score:5, Interesting)
Here's a chunk of perl script I wrote (years ago) that determines if $text matches any of the styles of library call number that I've ever encountered.
Slashcode is interestingly interpreting my formatting, but you should get the gist.
$text =~ /
^[A-Z]+ # starts with at least one capital letter
\s? # followed by an optional space
\d+ # followed by one or more digits
or $text =~ /
^\d+ # starts with one or more digits
\. # followed by a single decimal
or $text =~ /
\d+ # starts with one or more digits
\s # and a space
or $text =~ /
Thesis # starts with "Thesis"
\d{4} # then four numbers - year
\s+ # separated by at least one space
[A-Z]+ # from one or more capital letters
\d+ # followed by one or more numbers
or $text =~ /
\d+ # starts with one or more digits
\- # connected with a dash
\d+ # to one or more following digits
or $text =~ /
\d+ # starts with one or more digits
# followed by a space
[A-Z]* #followed by zero or more capital letters
\d+ # followed by one or more digits
Re:Match a library call number (Score:5, Funny)
Parent
Nope, not useful (Score:5, Funny)
But even an artist gets lazy sometimes.
One regex to match them all (Score:5, Informative)
[-+]?(?:\b[0-9]+(?:\.[0-9]*)?|\.[0-9]+\b)(?:[eE][-+]?[0-9]+\b)?
use Regex::Common; (Score:5, Insightful)
$text_with_urls =~ m/$RE{URI}/;
$text_with_ips =~ m/$RE{net}{IPv4}/;
Remove trailing whitespace (Score:4, Interesting)
Do these questions really belong here? (Score:5, Informative)
I wonder why such FAQs are still posted on a site like Slashdot. We now have a great repository for exactly this kind of questions:
http://stackoverflow.com/questions/tagged?tagnames=regex&sort=votes&pagesize=15 [stackoverflow.com]
Be lazy! (Score:5, Interesting)
OK, you asked for stupid tricks, but this one's just plain lazy.
Between bash and grep, there are quite a lot of special characters that you have to escape... Or just ignore with dots!
/I.do.this.frequently..(even.with.parenthases).,.because.sometimes.my....backslash..key.is.tired/
A couple neat things happened: The extra dot after frequently is matching an inline paren. The paren in the PATTERN right next to it starts the mark of an atom, closed by its brother. The comma is because I put one outside the paren (here represented as the dot to the left of the comma) as is my style. Also note the literal backslash, just before you see the word backslash in hidden parenthesis.
Why not add quotes to match the spaces easily? I get a word or two in, and I find I naturally switch to using dots. These are throwaways for single tries through grep. For production code, I hone in carefully on the parts that I'm dead sure I can anchor to, escaped by any means needed, before carefully choosing my atom to match as tightly as possible, so it'll error out if my data has gone wrong.
Even in a simple case like this, half the fun is in explaining it. :)
recursive regexp to match {} block (Score:4, Informative)
my $re = '';
$re = qr/
\{ (?:
(?> [^{}]+ ) # nao-chaves
|
(??{ $re }) # sub-bloco de chaves
)* \}
some that I've used ... (Score:5, Interesting)
^(?!000)([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}$
US phone with or without parentheses
^\([0-9]{3}\)\s?[0-9]{3}(-|\s)?[0-9]{4}$|^[0-9]{3}-?[0-9]{3}-?[0-9]{4}$
ISO Date (19th to 21st century only)
^((18|19|20)\d\d)-(0[1-9]|1[012])-(0[1-9]|1[0-9]|2[0-9]|3[01])$
Search through phone numbers (Score:5, Funny)
#$%^&*(&^%{{}}{/\/\||```
(No, that's not a regex at all. And no, I don't even have a single girlfriend.)
The most useful regex there is! (Score:5, Funny)
It must be said (Score:4, Funny)
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
-- Jamie Zawinski
Re:is it an rfc-822 compliant e-mail address? (Score:4, Insightful)
Mmmmm readable.
Parent
Re:is it an rfc-822 compliant e-mail address? (Score:4, Interesting)
The problem is that email addresses are not suitable for regex based validation.
There are too many legacy formats, too many variations, that are legal addresses.
Why, back in the old days, you could send mail to things like "bob%example.com@example.org" which would shoot the email off to example.org, who's mail server would then shoot the email off to example.com. A way to hand route your email around a broken network link in the old days. Throw in a few UUCP hops, maybe getting final delivery to a BITNET connected system. Ah, those were the days!
Parent
Re:Not a trick, but a question. (Score:4, Informative)
Magic stuff like this is not working: /\([FB][ot]o\).*\1/ although that seems to be the closest description of what we wanted.
In perl, I did /([FB][ot][o]).*\1/ and it seemed to work as you wanted. Also, if you're using a regex engine that supports lazy (non-greedy) quantifiers like perl does, I would use them in this case. It reduces backtracking. In perl, put a ? after the *.
Parent
Re:99 Bottles of Beer on the wall (Score:5, Insightful)
(I would quote the final result but /. won't allow that many "junk" characters.. let's hope that doesn't cripple this entire discussion.)
Interesting that a site for nerds doesn't allow a lot of characters commonly used in source code.
Parent
Re:Mainframe Formatting (Score:4, Insightful)
Parent
Re:email validation... FAIL (Score:4, Insightful)
Your regex doesn't allow + signs in the name part.
Nor, I would suspect would it handle quoted strings e.g. "Jeremy P"@example.com is technically a valid RFC 822 address.
And having just looked up the RFC 5322 spec which you quote, I see there are more cases you fail to take acount of e.g.
Jeremy P <jeremyp@example.com>
Also, what makes you think upper case in domain names is invalid? jeremyp@example.COM fails validation.
Parent