Internet Searching Using Regular Expressions? 8
/[Aa]non([aiy]mo?u?s)? [KkCc]ow[ae]rd/ asks: "Remarkably few people have a working understanding of regular expressions. But, those that do know how useful they can be for searching text. Has anyone out there seen a large search engine (like Google) that will take regular expressions for queries? How about a newsgroup search engine?" Aside from the fact that many regular expressions read like snippets of line noise, they are the best thing I've seen for searches, and it's a lot easier than -adding +alot +of +search -terms.
Re:Performance? (Score:1)
However, this all falls apart when the keywords are all very common, because nearly every page everywhere will contain them, and so the actual regular expression search will have to search thousands of pages. But for uncommon words, it works fairly well.
glimpse does do this, but it has problems with memory usage and being slow at times -- probably exactly because it does this.
Sorry, but I don't know of any full-Internet search engine that allows this. Your best bet is probably to write something that looks for the keywords in a regex, feeds them to google, then downloads every page that matches and then runs the regex on your own computer to further narrow down the results. Depending on how common your keywords are, it may work well, or it may try to download half the Internet.
Re:FP RegEx (Score:1)
Glimpse / Webglimpse (Score:1)
Unfortunately I don't know of any Web-wide RE-capable database. Here's hoping someone downthread does...
--
Re:Change the Storage Method [Was: Re:Performance? (Score:1)
[JjFf][ae][nb]\s*[0-9]{1,2},{0,1}\s*[0-9]{2,4}
This gets really messy really fast and exactly *HOW* are you going to do a query on a keyed database using this?
Change the Storage Method [Was: Re:Performance?] (Score:1)
This would probably require a very fast, large database to accomplish, but it could be done.
Re:Performance? (Score:1)
SELECT whatever FROM somewhere WHERE something ~ /somewhere.*over.*there/
Performance? (Score:2)
somewhere.*over.*there
Across an entire internet sized search engine? I guess you could pre-select documents containing somewhere and over and there and then proceed with screening them through a "standard" regexp search.
However, doing the prelimiary match using the regexp would definately be resource-prohibitive. In the above example, you would have to read the text of each file in to do the regexp. Not to mention the cost of keeping the text around.
That said, I can see how you could implement a regexp-like front end to a search tool if you had some restrictions as to what you could do with the regular expressions. However, I suspect the idea was more to be able to do advanced conditionals and other funky stuff within the regular expressions, and limiting this would probably limit the usefullness of the product.
So, maybe to summarize my rambling, the initial hurdle would be to re-invent the way normal regexps work in order to be efficient in a multi-giabyte database.
^I love regular expressions, but$ (Score:3)
Best to just download search results with a spider and hit them with grep, if you've got the time. [sigh].