Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
The Internet

Indexing Dynamic Sites For Search Engines? 7

Moeses asks: "I am working on a Web site that uses the Altavista search engine software. The latest version of the site has moved most of the data from static pages to dynamic pages. This causes some issues to arise, but I've developed work arounds for most of them, such as generating pages with URL's that contain all the query string information to index the whole database and code to handle situations where a user searches for something that can't be displayed because of some state information specific to that users session, but there are still enough issues that I can't index all the states of the files that I need. Building a custom search engine for the database isn't within the budget of this project. What are you others doing to index and search your dynamic sites?"
This discussion has been archived. No new comments can be posted.

Indexing Dynamic Sites for Search Engines?

Comments Filter:
  • The author has revisited this subject. Here is the updated version. [phpbuilder.com] IMHO very good article.
  • I won't speak for making the site indexable by the major outside engines. However, I came up with a nifty solution for local searching back in my StoryServer days, and it took less than an hour to implement:
    1. Generate a separate set of text files containing your content, with the query terms necessary for determining the real page's dynamic URL as the filename. That is, if you have "/articles/index.jsp?article=134", then you might name the resulting text file in a separate directory--which doesn't even have to be web accessible--as "article-134.txt".
    2. Use a plain old text indexer like htDig to index the text files periodically
    3. When a user performs a search, don't let itr return links to the text files. Instead, grab the filename and split its elements and splice the query terms into a reconstructed URL for the "real" dynamic article. You can index separate sections of a large site simply by dumping each area's text files into a separate directory.

    If you have a more sophisticated search engine that can deal with item tagging (for metadata like keywords, creation dates, authorship, description, title, etc.), all the better. Create your text files with the appropriate tags and metadata pulled from your database and get that indexed too, and when displaying search results you can parse it back out of the text file or straight from the database if you want. Verity's engine is very nice for this.
  • by benshutman ( 202482 ) on Sunday January 07, 2001 @09:46AM (#524764) Homepage
    i have this problem as well. so i did the halfway solution. i wrote a simple script that iterates through the dynamic pages outputting them as static html files. these files are submitted to the search engines.

    the problem is of course the pages end up getting old. no problem, add a little "this is an archived version of this page, please click here [silicongod.com] for the newest version" message. rerun the script when necessary.

    i did this and was able to submit all my dynamic pages to altavista. what i also did was add an additional little "prev | next" link at the bottom, so a spider could start at one page and follow links to the end. i went further and created a hallway page [silicongod.com] to submit to altavista.

    also, the pages are flat so they tend to load faster than dynamic ones.

    check out the page i submitted to AV [silicongod.com], and old archived page [silicongod.com] (contains the links prev|next links @ bottom, or the live homepage [silicongod.com]


    NEWS: cloning, genome, privacy, surveillance, and more! [silicongod.com]
  • You have too much free time to read my signature.
  • There is an old article [phpbuilder.com] on PHPBuilder.com that describes a meathod for creating dynamic, indexable pages. The article is written for PHP, but you should be able to use the same technique with other languages. Even if it doesn't work for all your pages, it still is a useful technique.
  • by mbyte ( 65875 ) on Sunday January 07, 2001 @12:32PM (#524767) Homepage
    use mod_rewrite to make your dynamic pages look like static html.

    An example:

    you have a script called news.php and an news index id (news.php?id=42 i.e.).
    You could map that to
    news_id42.html with
    RewriteEngine on
    RewriteRule .*_id(.*)\.html$ news.php?id=$1

    in your .htaccess.
    Voila ! your dynamic content looks exactly like a static html page.

    Anoter one is to fool searchengines that the script is an directory:

    foobar.php/param1/param2/

    Works perfectly fine ... ;) but this technique doesn't seem to work with all searchengines.
    (don't remember which ....)

    regards,
    Michael


    Samba Information HQ
  • Depending on the size of the site, indexing in background constantly is a good solution. If the site is big, make the server code generating dynamic pages support If-Modified-Since and use a search-indexer-spider like Alkaline [vestris.com].

    dB@dblock.org [dblock.org]

E = MC ** 2 +- 3db

Working...