Topical AND Keyword Based Search Engines? 7
jhughes asks: " I work for a public university and have just recently 'acquired' the project of switching a community Web page (for the city and surrounding counties/cities) to a new machine running Apache. I'm relatively new to Apache but thus far this has not been a problem. The real problem comes from the request of a search engine for the Web sites information. This site will contain a large amount of public information, such as rental property, council meeting minutes, and so on. They are requesting that the search engine be able to scan the entire site for all the data, or specific subtopics (such as 'City Government', 'City Council', or 'City Council Meeting Minutes', with searches limited to each of those areas if need be). To me this sounds similar to what Yahoo has done in some cases (You can search Yahoo, or Yahoo US States, or Yahoo US States California, or specific cities, etc). I honestly have no clue what search engine to use for this. Are there commercial search engines that one could purchase and adapt, or is there a free/open source one out there that would work just as well? There will eventually be a huge amount of information (as all government sites seem to accumulate) online, so it'd need to be able to handle large indexes. Any help, advice, or suggestions would be greatly appreciated."
Google (Score:1)
AltaVista (Score:1)
Take a look at http://solutions.altavista.com [altavista.com] under the heading AltaVista Search Engine 3.0.
A little disclaimer: I work as a developer for that product, so of course I'm biased; however, it really is very powerful.
But my grandest creation, as history will tell,
Re:You want Thunderstone (Score:1)
You can download their webinator demo and use it as search mechanism on your own website. See their website for more information.
ht://Dig may work (Score:1)
on Search Engines (Score:1)
As for building out a Topical or Hierarchical structure like Yahoo's or DMOZ [dmoz.org], you need to have meta data about the document. You can pull meta data from a URL, as was suggested earlier, but I wouldn't advise that. In order for this to work, your URL's start to look like 'http://city.gov/CityCouncil/MinutesOfMeeting/200
I've used both FreeWAIS-SF and Verity [verity.com] for implementing searches like this, as well as home-grown solutions. I wouldn't advise using Verity since I think that it's prohibitively expensive, unless it has some feature that you require and are willing to pay for it. And I didn't think the home-grown solutions worked as well as the off the shelf products that were customized.
You want Thunderstone (Score:1)
Your project sounds like a project for Thunderstone's Texis [thunderstone.com] system. I've worked with quite a few other search engines, and for my projects, Texis always beat the other players in performance/flexibility.
1. Keyword/Phrase searching (and,or,must include,must not include,must include N or more, etc.)
2. Prefix and suffix processing
3. Search by regular expressions
4. Concept searching with precompiled thesaurus as well as a user-defined thesaurus (you can customize it for your industry's jargon, for example).
5. Integrated P-Code compiled scripting language
6. Available for a bunch of platforms
7. Goes like a bat out of hell on huge indexes
8. Free [beer] version available for smaller projects
Their client list is pretty impressive; Ebay, NASA, ZDNet, etc., and lots of city/state sites. The downside is that it's not cheap: $5000 to $10000 or more just to start.
My only connection to Thunderstone is my prior use of their product.
PDHoss======================================
Excalibur (Score:2)
Look up Excalibur Technologies - You can get Keyword, Theme, and concept searches out of it
Disclaimer: It's one of the search engines we use at work - Other then the fact the bought me lunch once, no other connections