Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
The Internet

Is There Demand For A Better Usenet Search Engine? 106

Anonymous Employee writes: "I was asked for a feasibility analysis to provide high-quality searching in a large Usenet archive (all expect binary/porn groups and several years worth of archives). This is similar to what Dejanews wanted to provide before they re-branded to Deja last year. Do you think there is a need for this or is high-quality Web searching + Usenet browsing meeting your everyday needs in terms of information retrieval? If not, do the existing Usenet search interfaces suffice (Deja, one year worth of archives, not-so-good search interface - Remarq, three months worth of archives, okay search interface)? ...and also, is real-time indexing (i.e., you can search for an article 'very soon' after it has been posted) important?" In light of Deja's recent faux pas, I think this question is rather timely, and I have to admit, I wouldn't mind the ability to search Usenet posts older than one year.
This discussion has been archived. No new comments can be posted.

Is There Demand for a Better Usenet Search Engine?

Comments Filter:
  • cut back to yesterday's talk : the internet is a public place. And records should be kept public, and accessible.
  • by jon_c ( 100593 ) on Sunday July 23, 2000 @07:15AM (#911976) Homepage
    I ALWAYS use dogpile [dogpile.com] for my usenet searchs. it searchs deja current, deja-old, and altavista's usenet archives.

    it's pretty much the whole freakin thing :)

    -Jon

  • Usenet can be a valuable resource but currently there isn't a good interface for combing through the tons of historical posts. I usually go to Deja and do a power search when I'm researching a product because I believe the people are the best judges rather than some lazy, paid reviewer. I hope more search engines begin to tap the usenet.
  • Even with the advent and the humoungous (sp?) popularity of web-based message boards, I think in the end usenet still remains the best message-board system available. The old (relatively) methods are always the best. FTP is still the staple file transfering, and IRC remains a chat king (does AIM have more users?). Usenet has had the following from the begining and still is thought of as a place where the intellegent and learned go to converse (at least from my perspective, and don't get me wrong, it has it's share of trolls, just like anywhere else). I know my oldest brother to be one of the most knowlageable civilians in the country when it comes to military aviation, and where does he go to chat it up? Usenet. There's a wealth of useful information there that not many people know how to access.

  • Napster is not "lawbreaking" and/or an "illegal activity." The courts are decididing what should happpen day by day, but as of now its not illegal. "Those supporting Napster and MP3.com are just peddling in stolen material." I am an artist, i dont have any illegal MP3's, and i support both Napster and MP3. I have turned over my music to companies such as these in hope that people will enjoy my music and to prove that these services can be used legally. I support them both, do i peddle stolen materials? moron
  • by Chairboy ( 88841 ) on Sunday July 23, 2000 @07:24AM (#911980) Homepage
    The Dejanews usenet page has been my home page for years now. Whenever I needed to find something out, it was far easier to see if someone else had asked the same question I had in Usenet then it was to wade through Microsoft's MSDN site or page after page of crappy vendor HTML.

    In the last few months, the quality of the results that I'm turning up has decreased markedly. Deja has decided to shelve all their 1995-1999 Usenet archives and concentrate on just the newer stuff, apparently because that older traffic only accounts for 10% or so of their bandwidth.

    WHAT? Of course it does! There are enough people using Deja as their Usenet client for this to be obvious. The 10% or so of their traffic that was a result of the 1995-1999 archives was th result of hundreds of thousands of other people like me searching and finding answers.

    Deja has made a mistake in alienating the audience that made them one of the most visited sites on the web. For this, I predict that Deja will either fold of massively re-organize within the next year.

    They screwed us over and broke a trust. You can't regain THAT in an IPO.
  • While I agree with you that Deja has been getting much worse recently, the decision to remove the old archives seems logical. For arguments sake, let's assume that the 1995-1999 archive used 70% of their disk storage. Considering that only 10% of the searches were using this large amount of data. Also, Deja is moving away from usenet searching in favor of the more profitable (I assume) product review services. If Deja wants to stop doing usenet, then they should just stop doing it outright and let someone else step up to the plate.

  • The signal/noise ratio of usenet, and particularly the more Web-oriented forums, is getting exponentially low. I, personally, would be interested in the moderated technical groups being archived & searchable eg clcm. I'm not sure how effective the others may be. It'd be an extraordinary pain getting some decent info out of all the noise.
  • by Masem ( 1171 ) on Sunday July 23, 2000 @07:44AM (#911983)
    I asked this about 2 weeks ago when it was apparent that the older stuff was shelved. Here's my Deja response (on Jul 5):

    ----

    Greetings, Recently we moved the Deja.com servers to a new facility in order to provide greater reliability and performance. The move is now complete and we thank you for your patience.

    Please note that currently our Usenet Discussion Service only retrieves messages from the past year (back through June 1999). As announced, we are reconfiguring the service that provides messages posted more than 1 year ago in order to provide greater reliability and performance. This will take some time though, possibly a few months. Have no fear: We're committed to bringing these messages back online as soon as possible.

    -----

    So I would wait for a few more weeks, and see if the situation improves.

  • Whatever you do, please post your policy. Maybe I just came out from under a rock, but I didn't know that Deja only keeps a year worth. That "recent|past|all" menu is misleading.

    As regards 70% disk / 10% bandwidth, they do have to draw the line somewhere, but you can't go with just what makes the most money. I'd be curious to see if the cost of bandwidth and storage outweighs revenue from advertising, and other intangible revenue such as name recognition.

  • There is a real lot of demand for this. The only really working one (at least among known ones) is Deja, and its Usenet search capability is rapidly becoming third-grade-of-importance add-on to their commercial setups. Compare this to how many Internet searches we have (and how many we had before Google - and it's still the top one I use). We really need the Usenet Google - there's a lot of useful information among that noise, and we need a tool to extract it.
    Also, fast indexing would be a real bonus - so that if you look for a comment on recently-released software, for example - you won't get two monthes old data. But at least decent search and archival is necessary. Deja desperately needs strong competitor.
  • If you could get a high quality search engine with archives going back for many years (at least 1991 would be nice), I'd pay for a subscription to a service like that. But a free front end with ads would be acceptable.

    I have several clients who have almost completely abandoned deja because the quality has disappeared. They've asked me how hard (i.e. how much $$$) it would be to set up a similar service for them internally. I give them the cost estimates for a full time usenet+searchengine system admin and a pair of good machines. Then they ask if there is a company out there who would do the same thing for less money than the US$100k/year it would cost to do it themselves.

    It would be especially nice to see corporate accounts set up as well, so any employee in a company could do high quality searches.

    My own opinions [slashdot.org] on deja are pretty vituperous right now. If you could buy a copy of their old archives and provide a better service than those losers, you'd have a fairly large audience. Try doing what dejanews did when they started, going around to usenet admins and asking for copies of backup tapes. Be prepared to get old DC-150 carts and 9 track reel to reel and many other esoteric formats. Could be a fun project :-)

    the AC
  • I would welcome a usenet search engine that allowed me to search back several years, esp. if it was easy to navigate. Using linux every day at home and at work, I often run across situations where I need to know if something might or might not work. Dejanews was a great way to read about real-life experiences with things.

    The old Dejanews site was a great way to find information, and was displayed in an easy to navigate format. The new layout is awful.(Not to mention their stupid policy of advertising placement -- I post my messages to Usenet, NOT to Deja.)

    I don't know what the overall reaction to a site like this would be, but definitely count me in. jh

  • Usenet was one of the first internet services I used regularly. In idea it is still one of the most usefull services. However it has, in my experience, one fatal weekness. It is extremly vunerable to spam. When usenet began the internet was a non-corperate INFORMATION source. Usenet was excellent because it allowed communication between groups of similar intrests from around the world, and people who didn't share that intrest would not have reason to interfear. (Yes, there were the occasional flames in alt.religion and such, but that was comparativly small-scale) So it was designed so that anybody could post to any group, and post anything, and could even post anonymously. This was excellent for free speech, open discussion and the spread of information.

    Now, sadly, the internet is a corperate money making buzzword. Companys try to reach any audience with advertising they can, and the internet is a cheep way of reaching millions of potential customers. However in the need and greed of modern sociaty. People wanting to make a quick buck outnumber the people wanting a nice place for discussion, and it is easy for the money-grubbing people to write one message to hundereds or thousands of groups. Thus many many groups are significantly more spam than relavant posts. This drives off the people who would otherwise have been frequent posters, makes good posts hard to find and generaly makes the experience to un-enjoyable for a large part of the public to continue to have any intrest in. True, fringe groups will continue for a decade or more, but sooner or later the nntp protocal will become to much a bandwidth hog (thousands of spam email messages can do that) and most servers will close down.

    Alas poor usenet, I new it well

  • I disagree. Maybe not for technical groups, but for some of the more conversation groups, the S/N is pretty darn good. rec.arts.sf.written, alt.fan.cecil-adams are my favorites for general browsing.

    Usenet's strength is providing a single UI (your choice of newsreader) to thousands of discussion topics. Plus, given its academic roots, I guess that it has a higher level of discourse than some similar multi-topic setups. (Though I know some people who cut their teeth on AOL with its user moderation find Usenet a little more noisy than their liking. Personally I think that a greater "S" could make up for the greater "N" but I'm not going to swap communities or start paying AOL to find out.

  • If you provided comprehensive Usenet posting indexes in V-Twin, er, Apple Information Access Toolkit, [apple.com] format -- you would have the entire Mac Evangelism Strike Force bowing to you. I would pay a significant amount to have those indexes mailed to me quarterly. So would many, many, other Mac users I don't doubt.

    What
  • It's a very small niche. I suspect a small fraction of net users know what USENET is. As it stands, it will be a small island habited by old time hackers and net users.

    To much of the public, the net is becoming something like TV. Not many people on the street know what a newsgroup is (if you don't believe this, you don't know many people on the street - try asking random people, you'll be surprised.)

    Given this scenario, it's not likely that a usenet search engine will last for very long. People who want to use USENET will actually use it. Now that's a surprise.

    w/m
  • by drix ( 4602 ) on Sunday July 23, 2000 @08:01AM (#911992) Homepage
    I would concentrate on the comp.* and other technical newsgroups rather than trying to mirror the whole damn thing. I would hazard a guess that a lot of that 10% of traffic that Deja said made up their backpost searching was looking for technical support, hardware information, or software help. Having a 15 year backlog of rec.humor.jokes or alt.fan.brittany-spears (or any other pop-culture NG, of which there are thousands) might be cute, but it's really rather worthless.

    Just as food for thought, there are also some privacy issues here. You have to ask yourself: do you really want a decade or two of your scribblings to be instantly available and indexable and searchable by anyone on the planet? Think about it - every immature flame, every embarrasing post, every moment you'd love to live down, now showcased and painfully easy to find by someone with a couple of minutes and a computer... I'm kind of glad that Slashdot "forgets" or de-indexes my comments after a few weeks. There are a lot that I'd just love to bury and in effect have as soon as they exit my user info page and leave the search index. Now imagine them staying with you for years, even decades.

    And it can get worse than simple embarassment. I know for a definite fact of one case where two guys were engaged in a long-standing flamefest in a NG. Guy 1 went on dejanews.com to look at what else the other guy (Guy 2) was posting... and found some two-year old backposts to a cancer support group because guy 2 was battling some form of cancer. Guy 1 brought that up in his next flame and really just humiliated guy 2 in front of hundreds of people. Until deja killed their backlog, you could still find both those posts, and hundreds more just like it. Imagine trying to live that down.

    It's incidents like that that really cause me to agree with privacy advocates about the danger the Internet poses. Never in human history has it been as easy to delve into a person's past as it is now even without a superorganized listing of their thoughts and opinions of everything they felt compelled to write about for years into the past. Such a complete archive really would pose a lot of problems for many people (imagine just a 10 year log of alt.support.cancer ... !) Usenet has since its inception been a celebration of free expression. Stifling that because people have to worry about repercussions far in the future would be kind of shitty.

    With that in mind, like I say stick to the tech newsgroups, and you'll run into far fewer problems. :)

    --
  • I cannot fathom ever needing to search Usenet for anything. I can find everything I need with an AltaVista search on the web.
  • by ruin ( 141833 )
    "Is there a demand for a better Usenet s--"

    yes.
    --

  • I'm glad this subject has come up, because I recently posted a similar question as an "Ask Slashdot" but guess it was rejected.

    The Dejanews Usenet archive was one of the best resources on the Internet. I'd always check there before doing searches on company sites. The recent decrease in the archive database has reduced the usefulness of the service dramatically.

    I suppose the important question should be: Is the old Usenet archive worth preserving (for general use, or even as an historical record (it might appear that most of it is useless waffle, but who know's what people will think in 100 years))?

    If the answer to the above is "yes", then how can the archive be saved? Leaving it in the hands of a single company (Deja) means it's vulnerable to any silly decisions that the comapny makes.

    Perhaps a better solution would be a huge distributed database, where sites archive particular groups for a particular time (eg. some of the big Linux companies could "sponsor" the comp.os.linux.* newsgroups for the dates between 1995 and 1998). These could then be mirrored by other sites with the same interests.

    The two negative points I can think of (aside from the nightmare administrative aspects) is 1) what would the sponsor get out of this, and 2) just how big would the archive be?

    It is possible that a Gnutella-like system could evolve where people could search for archives, with a set of "root servers" providing searching facilities. With 100GB disks becoming available, the possibility of smaller newsgroups being archived becomes a possibility.

    All we need then is to persuade Deja to reimplement the full database (which I believe the eventually intend to do), and then get a tool to archive interesting articles. Anyone out there think they have the skills to write a "deja extractor"?
  • a market research tool? The slashdot users ideas are being exploited for the commercial benefit of others. And we don't even get the free food/coffee that is at most focus groups.

  • Oh -- I was wondering why Deja was suddenly finding so few relevant postings.
  • The 10% number is a bit misleading.

    It doesn't mean that only 10% of their audience does old searches and 90% does recent searches. It means that 90% of their audience occasionally need to do older searches.

    I would say that 10%-20% of my searches are for older information. Especially for hints that someone else has already solved a problem I am up against. Mostly I search comp.dcom.* and related alt groups. Quite often I am working on systems installed 10 or 20 years ago, and many people have documented what they have done in similar situations. A solution posted 5 years ago is still relevant to a problem today.

    So when they cut out 10% of their traffic, why did their total hits decrease by 60%? Because they pissed off 60% of their user base enough to abandon the system. Thats a pretty bad move for any company. Its why they have had to lay off 10%-20% of their workforce, and abandon their IPO.

    With any luck, a new company will buy a copy of deja's archives in the bankruptcy sale and do something useful with it.

    the AC
  • Seems like News archiving could benefit from the forgetless nature of Gnutella and similar technologies. The part I don't like about those designs is the HTTP-based transactions. It seems that since Usenet traffic is already encapsulated in messages and relies on the mailbox synchronization services of NNTP, we could just create a massive message file system (take a gander at what MS has in line for Office and Exchange). As more people get on line with permanent connections we could easily offer a small part of our disk space for a shared mailbox file system accessable via IMAP. Information would simply drift to where it is needed.

    The biggest risk with such an automatic scheme is that some data would eventually timeout because no one requested it anymore. I guess these messages would start to be treated like endangered species. Maybe we could just send them out into a deep-space, time-delay file system to save them.
  • Well. To a certain point, I can see what would motivate Deja to do this. If the revenue from it isn't coming close to the cost of upkeep, I can see where they'd rightfully shelve it.

    However, the Usenet junkie in me is kicking and screaming over this. My Usenet service from my ISP drops posts off after a couple weeks. And even the group specific server that my group's moderation pool uses drops stuff after a couple months because of the sheer quantitiy (and because it's only purpose is support of the moderation bot).

    Personally I'd pay money for and/or put up with a reasonable amount of banner ads to be able to search back through all the content.


    Chas - The one, the only.
    THANK GOD!!!
  • A few months ago, Deja made an announcement [deja.com] about the site move. According to the accouncement, which has not been updated since its original release, the old messages would temporarily be taken down, but we should "have no fear: [Deja is] committed to bringing these messages back online as soon as possible.

    In the meantime, Deja has been transformed into a mere free Web-based Usenet server that happens to have unusually long retention, but no binaries access.

    It has been a couple of months since then. Last month, Deja announced that the move was "complete"; however, most of the old posts are still nowhere to be found. There was an interesting Usenet discussion [deja.com] on the state of things, which included at least one thoughtful post [deja.com] as well as possibly a little light at the end of the tunnel [deja.com].

    Perhaps not all is lost. When (if) the Deja archive ever comes back in its entirety, it will still be the best Usenet archive around, hands down.

    I disagree with the Slashdot article's claim that Deja has a merely "okay search interface". As long as one uses the Deja Classic Power Search [deja.com], Deja has one of the cleanest interfaces around, with extremely flexible and powerful query options.

    One would be hard-pressed to come up with something better at this point. Even if one were able to cook up a better interface with even better query features, where would the content come from? Who has been archiving Usenet all these years other than Deja, Remarq, and perhaps a few other little-known entities? [remarq.com]

    I daresay that none of the current archive holders would be willing to grant archive access without considerable compensation. Unfortunately, one would have no choice; it's a little too late to start archiving the old stuff now!

    All in all, I would probably be in favor of just trying to get Deja back up in its full glory; this would be so much easier than starting from scratch. Perhaps all Deja needs is to hear (from thousands of concerned Slashdot readers) that their "old" archive is their most valuable resource, and should thus be given the attention that it deserves. I personally consider the "old" archive so valuable that I would be willing to pay a subscription fee to access it; I'm sure I'm not alone in this.

    So shall we all write Deja now, and let them know what we think?
    --

  • Everyone wants people to view thier site.
    Hits pays the bills, so people are moving to message forums.

    What about Ultimate Bulletin Board exporting messages to newservers?
    Then its the best of both worlds.

    Only newsgroups I use anymore are hardware vendors.
    I'm on a dozen webbased message forums, all those banner ads.

    -Brook Harty
    [I have the Enemy Flag heading back to our base. Clear the mines from our flag...]

  • by weave ( 48069 ) on Sunday July 23, 2000 @08:48AM (#912003) Journal
    If deja just got rid of older alt archives then they'd have cleared up a ton of space.

    My news server has just the big 8 and only the alt groups that users request. With a 15 gig news spool, I only have to expire articles after two months.

    Doesn't take a math wiz to extrapolate that to see how mucn disk space a years worth of REAL usenet newsgroups would hold.

    They should have never trashed 1995-99 without notice. 95 was when the net started to explode and removing that removed history that can never be recreated.

    (Then again, I'm glad some of my old posts finally went away. x-no-archive works, but since everyone these days just quotes entire articles when replying with one line at the top, x-no-archive was a bit useless anyway...)

  • First let me say, USenet is far from dead. I use it to find answered to technial questions and find about products (how a new graphics card works under linux..etc) regularly.

    I have been using DEjA [deja.com] for a while and in the past year or so, the quality exponentially went down.

    So now I am looking for alternatives? A search [google.com] on google [google.com] doesn't reveal much.

    Can people name any similar services that exists? A poster mentioned dogpile [dogpile.com]. ANy others?

    I would really like to see GOOGLE getting into this.

    /LinuxLover

  • Deja is maybe the ONLY Internet service I will not hesitate to pay for. It has become my prefered since 1995. I can't imagine leaving without it. Alas the Deja management has not clue, the service has been declining on a regular basis, every new version is worse than the older. I can't imagine how things can be made as badly. Fo instance I used to LOVE the old interest finder, untill somone imagined that will be nicer to have the results threaded, you lose all the necessary information. They have also managed to clutter the nice and clean old look with graphic crap. Also their technical support has been less and less responsive, it seems to me that tey leave in a kind of black box, that I have stoped sending them feedback; they also do weired changes to their engine without even getting the time to infom you about them.

    What's a pitty for maybe the most usefull tool on the net along with (Google and Slashdot :)
  • Hmm. One of the biggest problems with USENET is that the gems are really rare. There simply is far too much crap on USENET making the signal/noise ratio really low.

    Deja probably can't keep everything that is on USENET (anyone have any idea the total number of newsgroups there are?) and for them to make any money, the ability to search an increasingly large archive becomes unfeasable.

    I used to love USENET, but now it's nothing more than a tool for the would be marketers to send unsolicited e-mail and to shameless promote crap. It basically has lost a lot of its appeal lately.

    One good way to get by the search problem is to distribute the content of USENET to separate archives based upon their hiearchy, so that comp.os.linux gets archived by VALinux (or similar) and that all you need is a client tool that would search the indexes of these separate archives.

    Unfortunately, it is somewhat late to actually implement this. If only this was a requirement far earlier in the process of creating newsgroups...

  • by Anonymous Coward
    The really good stuff was from the start (1990ish?) to when the net exploded (1995ish).
    During that period, in the sci. and comp. groups, there was a lot of good information as it was mainly university people exchanging info. I remember back then reading about people archiving the net to glass disks and such. Where did all the old net go?
  • Is it me or is Dogpile's usenet search limited? I can't seem to search by date, language, sort by, etc. like Deja.com's Power Search [deja.com]. :(

  • by Anonymous Coward
    I think that NNTP needs to be extended it include automatic authentication. USENET's signal to noise ratio is insanely low in some forums. It's sad becuase there is some good information out there and it used to be great, 10 - 15 years ago.

    It's too easy to spam, even semi-private NNTP servers, like Netscape/Mozilla server are hit with spam enough to piss you off. All the serious development takes place on mailing lists and that can be a lot of email. (and it's not spam proof either, the kernel list is spammed at least once a week)

    We need a public key infrastrucutre added to NNTP such that to post you need to use a key that has been submitted to a server, at least for technical forums. Call it automatic moderation, once you're trusted then you can do whatever, spam causes your trust to be yanked. This has the added benefit of building up the key web.

    Services like Deja provide a useful service, once you weed the crap out. Most of what they archive is junk. And there is so much of it that they can't keep it all online. There have been some critical usenet threads that need to be archived for easy access. There are still important threads and messages posted.

  • Just as food for thought, there are also some privacy issues here. You have to ask yourself: do you really want a decade or two of your scribblings to be instantly available
    and indexable and searchable by anyone on the planet?


    Check. Go search on dvdeug or dstarner98 on Google, and you will find all my scribblings instantly available. Including some stuff I'd just as well forget. It's life - think before you post. Use x-no-archive wisely and there won't be much of a problem.

    It's nothing new - lots of people are embarressed by things they said decades ago. If you plan to go into politics, then start censoring yourself now. Otherwise, you should still think before you speak, but it's probably not going to matter.

  • What about that group that says they mirrored the "internet" going back so many years. I wonder how much of dejanews is stored in their archives, or any other usenet->web gateway that existed however briefly.
  • I find USENET to still be one of the most useful internet resources. It has never been easy to search, even with deja; I was delighted to discover the link to dogpile in a post below, I'll definitely be making use of it.

    I have found that at my current place of employment, where there is no news feed, it is considerably harder to get work done without that resource. I used to ssh into my cable modem box to browse and post from there, but now I've moved to an apartment without cable. So I'm negotiating with our sysadmins to try to get a newserver set up.

    The dogpile interface is definitely better than the old deja. In particular, each "hit" gives you the entire thread, instead of spreading a thread out amoung individual article links.

    I think that many of the complaints about usenet being swamped with spam and useless are from people who are not familiar with better news readers. You can filter a lot of that stuff.

    One thing I would definitely like is a usenet interface to slashdot. If it was read-only and you had to log in and go through the web interface that would be fine.

  • This is a very important point! Even deja never had the REALLY old posts - those are the ones that I'd really like to see.. back to the very beginnings of usenet. But those are lost, unless anyone has any posts they've saved up in some archive somewhere. In any case, how could any other company get started? Would deja be willing to sell its old archive?

    What I think would be ideal right now is if deja split into two - they could have a dedicated usenet searching facility and archive, and another company for all the crap they've been doing now. They could work together very nicely, but DON'T let their web hosting or whatever get in the way the most important thing - Usenet archives.

    And why oh why oh why did they change their name to deja!?!?!? Dejanews was a cute pun... I liked it! Perhaps if they DO split up a bit, the news division can inherit the name. Then we could have dejamail, dejaweb, dejashop, and dejanews.

    Does anyone even USE the non-usenet services they provide?
  • I agree, and I've thought about that some. But I think the only way you could guarantee a spam free USENET is either strict moderation of everything, or a registration system where you would not be allowed to post until you registered, giving address, phone, etc., and having it verified. It would have to be controlled by an independent orgaization that would have a contract with ISPs -- they would not get posting access until they guaranteed that all their users registered with it.

    I dunno -- it would be a mess, but would sure solve some problems. Of course, kiss privacy and anonymity goodbye...
  • by MikeFM ( 12491 ) on Sunday July 23, 2000 @10:10AM (#912015) Homepage Journal
    One of my current projects is a search engine that combs both the web and usenet based on simularity data. A portion of this data is computed using analysis of files, locations, etc and the rest is done by a sort of moderation system similar to Slashdot that lets users group and rate files. To the system both text and binary files are able to be searched. So if you found a pic you liked you could use it as your sample and the search engine would return all of the others that matched the search you specified. You might get back pics that matched the same signature as the sample, pics w/ a similar name, or pics that had been group moderated into the same class as the sample. Right now I'm doing a lot of research on file signatures, ways of telling how similar one pic (or mp3, or anything) is to another file of the same type (pic, sound, text).
  • I want a USENET engine set up as a humongous old USENET server--or perhaps cluster of servers. Call it 1991.newssearchengine.example.org, 1992.newssearchengine.example.org, etc.--or call the groups on a single server rec.arts.anime.misc.1991, rec.arts.anime.misc.1992, etc. (and do a global S&R on headers to replace newsgroup names in messages with the groupname.year thingie so that crossposts would still register properly).

    You can connect to this server using any USENET newsreader program, read any article you like that way (but not post, of course). That would let people use their favored newsreading environment--which already has functions for threading and searching individual messages--to go in and read whatever they want without having to screw around with USENET search engines' moronic interfaces (of which I have never yet found one that worked decently).

    Granted, this would make it harder to do global searches across multiple years...but I'd gladly sacrifice that in exchange for an interface more useful to me in searches of smaller scope.
    --

  • This is what you used to be able to do before AltaVista's redesign:

    Type in a search phrase and pull up all matching web pages and usenet articles which matched.

    This was EXTREMELY valuable, especially if you were hunting down the answer to a question or problem. One search did it all, and the breadth of knowledge in the web + usenet could not be beat.

    AltaVista modified their service so that they no longer do usenet searches, about a year ago, I think it was. I couldn't believe that they would take out that excellent feature, and I wrote them and complained, but of course they didn't care (for the record, you can still do some kind of usenet search, but its not on articles it just returns pointers to "relevent newsgroups").

    I never use AltaVista anymore. I use Google, which doesn't do usenet either (although I suggested it to them and they said they would be considering it in the future), but I'll be damned if I use AltaVista again.
  • Assuming you're not a troll, I can explain exactly what I need to do with USENET searches.

    I am a small-time computer programmer (for my own enjoyment) and local technical expert (called upon for practically all Windows installations). I fiddle with Linux a lot - my computer has a 17.2G drive for bloated software like Windows 2000, and a 4.3G hard drive devoted entirely to Linux distros (PC PLUS in the UK is great with this [Mandrake 7.1 this month.]) I also do a bit of amateur spamhunting (you'll see me a lot on news.admin.net-abuse.email.)

    One of my most used tools is Deja.com/usenet. I have it on my slashboxes. With deja.com, I can immediately search out whether this pernicious "Find Out About Any Poor Shmuck Fast Now" spam (I recieve it every week or so) has been posted to N.A.N-A.S [admin.net-....sightings] yet. Also, if I have technical problems I need to solve, I call up Deja first. I don't search AltaVista, partly because I don't like AltaVista (I prefer Excite [excite.com] or Lycos [lycos.com]) and because all Web search engines, especially when faced with computer support queries, don't successfully find what I want. Also, several big pr0n and other dubious sites will definitely use support queries in META tags to drive you off the route - I've had this happen to me before.

    Therefore, services like Deja help me to find out about new drivers, or how to make my Sony tape drive work - they help solve problems. This is the entire purpose of Deja - not to make up a huge Britney Spears fan collection, but for technical information. And yes, there is a demand for a better engine (for a start, one that you don't have to click three different time-consuming links to move into a thread) - but at the moment, Deja and remarQ are all we've got. And thankfully, we have them.
  • by Anonymous Coward
    Saying that they pissed off 60% of their user base enough to abandon the system is also misleading...
    (different users are not equally active)

    Saying that they pissed off a lot of users (who together accounted for 60% of the hits) is probably more accurate...

    I just couldn't stop myself from saying this.. sorry! ;)
  • Dogpile sounds nice in theory, but it has some of the same problems that plagued other engines...for example, try searching for "c++". It attemps to second guess you, by telling you that you *really* meant to search for "c". Not good.
  • Oh yea, there's a demand all right, I'd certainly like a better search than deja but would I be willing to pay, will advertisers be willing to pay, who else is going to pay? Last year a business plan that skips the payment part might have worked, these days "show me the money".

    Instead of asking "is there a demand" ask

    is there a demand if each search costs the user $0.01

    or

    is there a demand if each results page has three banner ads (and is there a demand from advertisers for this)

    So, what's your projected cost per search and how do you indend to cover it?
  • On the other hand, there are those of us who've known from the moment we first posted to USENET that it was a public area and anything we say there can come back to haunt us in the future. Haven't the USENET FAQs always said not to post anything you wouldn't want your family, friends, future employers, worst enemies, and so forth to read?

    Remember...engage brain before posting to USENET. If you don't, it's your own fault and you can't expect the rest of the world to look out for you.
    --

  • Deja(news) used to have a pay usenet site that was essentially as you describe. Why is it not around any longer? The problem, in a nutshell, is that a lot more people said they would pay for such a service than actually plunked down any money. (I think they called it "Personal Newsreader" service.)

    But then again, a case could be made that Deja(news) didn't promote it at all, hence it was doomed to failure from the start. But what do I know, I've only worked at Deja.com for less than a year.

  • I pretty much gave up reading Usenet on a regular basis probably about 3-4 years ago due to the volume and noise. Dejanews at least provided me with a way of looking for specific info when I needed it. Granted, for that kind of use a shorter, year length history is usually fine. However, the much longer history has proved indispensible on a number of occasions when dealing with older systems. One particular project I worked on was upgrading the version of Tcl/Tk from one of 5 years ago (when the project's software & hardware were frozen) to the current version. There were a number of major changes in the language & libraries in that time, but instead of having to find all the difficult migration issues (some of which the docs didn't mention) myself, I did a bit of searching on Dejanews and saved weeks of time and considerable fustration. One is rarely the first to have a problem -- and interactive knowledgebase of Usenet often provides the solution. I'm really disappointed by deja cutting back this resource and would love to see a more complete & better archive out there!

    Also, aside from the practical aspects, the contents of the archive are a valuable bit of .net history & group memory -- it'd be a shame to lose it...

  • Why rely on one centralized commercial archive? I guess most of the users of the usenet have their own small archive of newsgroups and posts they see as valuable.
    All we need is an open protocol for communication between newsreaders, for passing around searches and results, and enough readers to support this. Searches would be independent of the moods and financial motives of the owners of a single archive and consistency could be assured by the redundandcy of hundreds of people having the same article in their archives.
  • Deja surely has dropped the ball. Instead of just going to the site and searching for Usenet articles, I have to wade through mounds of advertising before I can get to the search form.

    Not only does the Internet need a better Usenet search engine, it needs an entire Usenet frontend. More and more ISPs are either not offering news service, or simply pointing people to supernews.com. To me, that's a waste of time, as supernews is often overloaded, and the traditional Usenet interface isn't exactly as user-friendly as it once was, what with it being mostly spam and porn these days

    I'd like to see a site that I can not only use to search, but also to post and reply to Usenet articles. I could give out a lot of help and free advice if only I didn't have to fire up a news reader. Don't get me wrong, because I love command-line interfaces, but just being able to bring up the relevant information is much more helpful than having to look through a buch of posts about how I should buy these printers or use these domain registration services.



    Brad Johnson
    --We are the Music Makers, and we
    are the Dreamers of Dreams
  • in alt.fiction and other groups people post original works and some copyright those works. There is an implied right for that work to propigate through usenet and be used for a certain amount of time but Deja by keeping such posts for YEARS, rebranding them, slapping an ad on them and turning a profit on them is nothing short of blatant copyright violation (indeed US copyright law doesn't require a copyright notice on the work).

    I doubt you'll see ANY service carry more than a year's worth of articles and for many corporate lawyers a year may be too long. One day someone IS going to sue these services for copyright violation and the Usenet services live in fear of that day.
  • by DHartung ( 13689 ) on Sunday July 23, 2000 @11:16AM (#912028) Homepage
    ... was Jeremy Nixon's Deja power search [exit109.com], especially after the redesign/relaunch. It's basically just a reorganization of the form from Deja's own power search [deja.com] page, but I find the slightly different interface (with no unnecessary graphics and no scrolling) to be simpler and quicker to use.

    Unfortunately Jeremy doesn't have his own back archives ...
    ----
  • I wouldn't be surprised AT ALL by the lack of people who know what USENET is...I work as a programmer, and there are many poseurprogrammers I have worked with, in the past and present, who never even heard of USENET. It's un-freaking-believable the kinds of people who are calling themselves programmers these days. It seems all that is required is a high school diploma, if that. No experience or knowledge necessary. The Indians are going to destroy the American software industry if this shit continues.
  • Should have a full USENET archive back to when they started Alexa. They kept the library after selling Alexa to Amazon.

    Deja was useful and still is, but they seemed to decide supporting USENET was not where the money is. No surprise, USENET is largely abandoned by the software development community and money community now. The top newsreader is from microsoft!
  • A search frontend, like this one? [exit109.com] You can also go find some more frontends [google.com], they're out there, or you can just write your own.
  • I used to find Altavista really useful for searching Usenet messages (one of the premier sources for bug information and workarounds) but now they've replaced the functional engine with a new one. When you do a search, you can only retrieve articles from the first page. If you go to the next page, if you try and retrieve any articles, it claims they're not available even though if you restructure your search phrase to get the article in question listed on the first page, you can retrieve it just fine.

    But even for web searches, I'm coming to the conclusion that Altavista is just sucky. Firstly, it's way out of date, half the links are broken and it's still indexing my homepage as having content that was changed over two months ago. But mostly I get annoyed that the search page refreshes itself every 5 minutes. Presumably this is done in an effort to fake ad impressions but it is annoying to have a search page disappear while you're trying to read it (especially since at work, I frequently have to disable the proxy for development purposes so it refreshes to an unavailable page) but it doubly sucks at home where I have a dialup connection and leaving Altavista on screen means my phone line is always busy.

    Sorry for the rant, I just had to say it. Suffice it to say that I'm looking around for a decent alternative (I'm starting to use Google from Monday I think)

    Rich


  • We definitely need a good Usenet site for searching and timely browsing/retrieval. I have used deja religiously. When my company blocked it as explicit content I freaked and sent an e-mail to everyone in the company I knew. A slew of people backed me up and those are only the people who cc'd me. They wrote long descriptions of why the site was so valuable, ordering our firewall group to return it's access. Took about a week but we got it back.

    We must start an open source project to provide this functionality. It is our duty to expose this wonderful free forum to the general internet community. The problem is the technical limitations. Such a program(probably distributed) must have these characteristics:

    1) Fast Searching - this is rather tricky. Dejanews was, or still is, capable of searching the full text of 100 million articles VERY quickly. I don't know how they did it but I would imagine this would require custom database like functionality that could preprocess all articles indexing each word such that all the work was done in advance. This would certainly be the most difficult part of the project.

    2) Quicker Posting/Retrieval Response Time - As it is deja does not respond too quickly to posts. It takes several hours I believe before a message is searchable on the site. I don't know about browsing particular groups but it would have to be on the order of minutes. This should be possible as MS Outlook News Reader is quite fast at this and we all know it has nothing to do with MS Software :~)

    3) Ability to populate database with mailing lists. It would be a very cool feature if one could add an arbitrary mailing list such as that of an open source project to the indexable archive.

    Also I believe someone could make a viable business out of it. A Usenet search site could be very profitable. I'm surprised no one has caught on to this. Deja dropped the ball and hid the "discussions" pages behind it's new facade. Odd. I can only imagine how many hits a site like that would get. If the USENET search capability was presented as the premiere offering of a site it might not seem so obscure to average users. They would perhaps discover Usenet as the great alternative source of information that it is. Usenet would be the latest and greatest thing! With all the "instant messaging" going on it would not seem so foreign to people. Also I would think some people would be willing to pay for this service. I would be willing to pay a small fee. Say $20 a year?

    KidSock

  • Try hitting stop on the browser. Maybe that would kill tha JavaScript thats doing it.
  • This doesn't only relate to usenet, the problem with search engines today is that they don't in any way cater to people who know *exactly* what they're searching for. I think it's about time that someone comes out with a search engine for geeks that forgoes all the fru-fru simple language stuff for oodles of terabytes of pages and way of searching the damn thing with regular expressions. Hell, I'd pay good money (maybe $100 a year) for a service like that.
  • all expect binary/porn groups and several years worth of archives

    Of course we all expect binary/porn groups and several years worth of archives...

    What? Oh, you meant except? Damn...

  • Brewster Khale's Internet Archive [archive.org] has an archive of Usenet from 1996-1998, but they stopped for some reason (did they think Deja was doing a better job?). And it's only about 600 GB, so the disk space should be pretty cheap.
  • Just reading through the thread and seeing that deja no longer has older archives available (whether they choose to reinstate them in future or not) leads me to the worrying question of who is the guardian and maintainer of this huge mass of cultural history? Who will make sure that in the future, people will be able to see the furore surrounding the first spam from Canter and Siegel (I was there) or the post where Linus Torvalds laid down the spec for what was to become Tux (I saw that too)?

    In a world where technology companies companies can sink overnight, are we to leave this to Deja and others? Is there no public repository where these Usenet articles are archived for all time? Is this publicly generated mass of information not available directly to the public?

    No, the question is not whether we need access to a better search engine for Usenet but that we should have access to Usenet itself. I should be able to order a 100(or whatever) DVD set of Usenet posts to have in my own home. This is too important to leave to the whim of the market, this needs to be preserved for humanity for all time.

    Maybe some of the people who have copyright on Usenet posts could bring some pressure to bear on the archive companies to make something like this available.

    Rich

  • It's a meta-refresh tag, not javascript. All the same, the stop button doesn't work.

    Rich

  • There is an implied right for that work to propigate through usenet and be used for a certain amount of time but Deja by keeping such posts for YEARS, rebranding them, slapping an ad on them and turning a profit on them is nothing short of blatant copyright violation

    That's what X-NO-ARCHIVE: YES is for. If you don't want your posting archived, add that to your header. That can be both good and bad, by the way. I've been noticing that more and more people automatically decline to have their Usenet postings archived, which can mean that genuinely useful advice or commentary disappears forever. I wish that this option would be used more selectively.
  • The Internet Archive Project has Usenet archives from 1996-1998...it is a .5 terabyte collection, but it is currently all on tape. However, they STOPPED archiving Usenet in 1998. www.archive.org [archive.org]

    The Internet Archive Project is the project attempting to archive the entire web and related internet contents as a matter of public record. They currently have around 15 terabytes in the archive.

    Push them to resume archiving of Usenet, and to get their old stuff online from the tapes. This is HISTORY, people! Historians 50-100 years from now will be DIEING to look at this stuff, and won't be able to belive that we threw it all away, even though the cost of storing it was dropping exponentially.

    I would kinda hope that my great-great-grandchildren could get to know me by reading some of my better usenet posts.

    --Braddock Gaskill
  • I started reading Netnews back in 1982 or so, before the Great Renaming ... heck, back when "the Internet" was a Larry Landweber proposal to replace ARPANET. At the time, aside from mailing lists, it was the only electronic discussion medium in existence. Like the current e-mail network and the World Wide Web, and IRC at one point, it had an interesting property: there was *one* network. Some sites got full feeds, some partial feeds, and there were a handfull of local groups, but everyone was on "the 'Net" whether they were at UC Berkeley or Bell Labs or the Pentagon. If you wanted a discussion, you took it to a mailing list, or you took it to Netnews. (There was FIDO, but rounded to the nearest hundred thousand, it had zero users.)

    That creates an effect not everyone sees. Usenet was the birthplace of hundreds, maybe thousands, of electronic communities, long before people were using "e-" as a prefix. Those communities, the people and personalities and cultures, are what made Netnews so attractive, so involving. (The current buzzword is "sticky.") Of course you're going to come back to see if your favorite netscum posted something outrageous, or if someone answered your question or replied to your answer.

    Web-based discussions didn't kill Usenet, but they darned sure hurt it. Instead of one "'Net," there are tens of thousands, maybe more. I can't count the number of Web-based discussion forums I've seen. This conversation we're having right now is off in some tiny little corner instead of in a news group. There are lots of advantages to having it here ... and *all* those advantages have hurt Usenet when it comes to mindshare, and to the ability to attract the people who make 'Net communities work.

    Instead of a grand city, with some wonderful neighborhoods and some seedy ones, we've got surburban sprawl.

    Netnews could have survived spam. It could have survived the astonishing growth of online participants in the past five years. (It survived AOL, in many senses.) It's having a hard time suriving its current competition. Part of me is very sad to see it wither.

    Ironically, the Web is both the medium in which Dejanews tried to grow, and the medium that choked off some of its best source material.

    I'm saddened by Deja's dwindling support for Netnews archives. (Did they used to go back as far as 1990?) I understand why they failed to turn a profit on the business, why they've got a terabyte and a half (literally) of archived material they consider too expensive to keep online. I appreciate what they've done, and I'm glad to have what they still offer. I wish the Dejanews business had thrived; I still wish it well. --PSRC

    "I'm not speaking for the company, I'm just speaking my mind." (my old Netnews .sig)
  • Then don't call it a USENET search engine. Call it a Technical Support Thingameebob or something like that. Chances are that if you are having a problem with your computer you aren't the first person to have that problem. Most likley someone has asked about it on usenet and recived a perfectly adequate answer.
    The new user types their question into the search engine and out pops an answer. Without ClassicDejanews I likley would never have gotten my, printer, soundcard or joystick working under Linux. It was there and it did its job.
    If people recognized the potential a USENET search engine has to help them they would flock to it.
  • by Wakko Warner ( 324 ) on Sunday July 23, 2000 @01:19PM (#912044) Homepage Journal
    Well, it *went* back further. They pulled all the older articles for a while, according to their support staff. Once they've upgraded their server farm, the older articles will be back online:

    As of May 15, all messages posted approximately a year ago or more have become temporarily inaccessible via Deja.com. We will be taking this opportunity to reconfigure the service that provides messages posted prior to September, 1999. Therefore, these messages will not be accessible on the site for some time, possibly a few months. Have no fear: We're committed to bringing these messages back online as soon as possible. We request your patience as moving our server bed to a new facility will greatly increase our reliability and performance.

    - A.P.
    --


    "One World, one Web, one Program" - Microsoft promotional ad

  • If you're sick of using your browser *at all* to search Usenet, do what I did and go here [wildspark.com].

    - A.P.
    --


    "One World, one Web, one Program" - Microsoft promotional ad

  • in a word? YES. There are so many times that my USENET searches get all weird on me, like returning things I didn't want and such, and I haven't found a search site that implements the kind of filtering that I want.
  • If Microsoft can be broken up for what they did to personal computer operating systems, Deja should be broken up and their Usenet archives made part of the Library of Congress for what they did to access to those Usenet archives.

    What? That's "Socialism?"

    Fine, let Microsoft compete against AOL/Time Warner/CNN/Netscape on an equal footing -- no break up.

  • by Agamemnon ( 63745 ) on Sunday July 23, 2000 @02:46PM (#912048)
    I'm sure that there is a demand for the sort of service you suggest, but I doubt that there's enough of a demand to make it commercially feasible. In my opinion, if it could have been a money maker, then Deja would have hit the jackpot. The changes they made (or tried to make) to their service a year or so ago were innovative and interesting, but never seemed to catch on, or perhaps weren't implemented properly. For instance, Deja created a feature that would generate an E-Mail message to a user's mailbox if a response to his post was detected: what a great idea! However, it never seemed to work.

    I was really excited when the new Deja went into Beta testing of their expanded capabilities. Unfortunately, the potential was never completely developed, and now Deja has changed directions: Usenet is almost an after-thought, now.

    Another example of Deja's Usenet scale-back: some of the slick graphical Usenet navigation tools have been removed. Remember the four-way arrows introduced early last year? I believe the up and down arrow would jump to the next thread. The left and right arrow allowed movement within a thread. Very handy tool. Now it's back to the old style, still effective, but not as user-freindly as the arrows.

    It's a shame that Deja has moved away from Usenet, but I suppose it was inevitable. As a 5 year veteran of Usenet, a self admitted newsgroup junkie, and an unapologetic devotee to Agent, a piece of software that's seen little modification in two years, I have to admit that Usenet is not a tool that is easily mastered. Well, at least not by the majority of moderate-use Internet visitors, that is. I'm still explaining the concept to my co-workers but, for some odd reason, they seem to be intimidated by Usenet. Guess if it gets beyond point and click, homepage and favorites, most people lose interest.

    To sum up, although I'd like to see a service similar to the one that you mention, I don't think it's a money maker. If it were, then Deja would be promoting, expanding, and improving their Usenet capability, rather than scaling back and minimizing it.

    There is, of course, at least one alternative possibility: Deja mismanaged their upgrade, and squandered it's potential.

    I don't know enough about the inner workings of the company to say one way or the other. However, I tend to think that the problem lies not with Deja, but with the nature of Usenet. Usenet is intimidating to many Internet users. For some, the concept can be difficult to grasp. Obviously, it's not as simple as the Web, and of course, the simplicity of the Web spoils many Internet users. My point is this: I don't believe Usenet, outside of the binary groups, particularly MP3 and porno, will ever attract the level of usage that the Web generates, even with tools such as you propose. And, of course, you specify that binary groups will not be implemented in the proposed service (and rightly so). So, although I'd like to see you give a favorable report, I doubt that you will. Please let us know one way or the other.

    One good thing that will come of this: Deja's "Power Search" has had some of it's fangs pulled: all of those embarrasing posts I made to Usenet years ago, before I realized they could all be traced back to me, as the years go by and Deja loses Interest in archiving, they'll be that much harder to access :)
  • Yes, there is a need for a good usenet archive searcher. I use(d) Deja quite often to search for stuff in the past, but it's going down the drain real fast. Yet, until I find a replacement I will still go there every first monday of the month (cron reminds me) to find out if someone somewhere posted something about a piece of software that I maintain. By the way, this also suggests a feature that I would like to have: the ability to completely automate this search.

    But be sure to make your stuff better than Deja and to better respect people's rights and feelings. I.e. NO editing of the messages, please, and certainly not to insert ads. As a matter of principle, I'm busy nuking all my posts from Deja because of the ad issue. The home ones are gone already, the work ones will be as soon as I get back to the office.

    --

  • I think the store-and-forward scheme that Usenet uses is just obsolete, and this is causing the gradual decline of all the other technologies built on it. If I post a short article to an obscure newsgroup with a handful of readers, how many megabytes of storage is that innocuous message taking up on thousands of news servers around the world?

    My ISP gets newsfeeds from several sources, and still I get maybe 30% of articles missing. Loss of articles is simply not acceptable in a modern system, when you have to compete with message boards which have no article loss at all.

    Usenet needs a rebirth to position it as a viable alternative to message boards. It needs to concentrate on high traffic groups which actually benefit from worldwide mirroring.

    Meanwhile, private message boards can take over the low end of the market, while providing NNTP interfaces and all the features of a real news server. The big difficulty here, in fact, is that most web hosts would not allow their users to set up a 'server', restricting the idea to the big boys, which is the opposite of the proposed low-end niche message boards should have. Perhaps something could be jury-rigged to send text via HTTP, with a client-side translator program acting as a proxy news server. (Too techy, though)

    And news readers, the final point in the triangle, need to have solid and seamless support for getting groups from many servers.

    Even the best web-based message board is nowhere near a good newsreader for ease of use. Sadly, the Web is seen as the only interface needed for internet applications these days.
  • by Anonymous Coward
    I saw the article summary up there which said that Deja now only archives back about a year. So I dug back into it with the email address I still use as my primary address (the one that goes on resumes etc.) and yes! They're finally not indexing all the irresponsible crap I was saying as a Linux zealot and a Chaos Magick dabbler a few years back. I say Usenet should go to hell. I'm glad they've pitched the deep archiving they used to provide (going back several years).

    Soon, with the help of Web discussion sites like Slashdot, maybe Usenet will just cease to exist.
  • Make all the search engines you want. You don't have the <B>DATA</B>! You can make all the front ends you want, but when (and not if) DEJA goes down the toilet, it ain't gonna matter. It will be gone!!!
  • Actually, what really sucks is that Dejanews purchased Usenet archives going back to the early 1980s some time ago. They used to have a page up announcing this and claiming that they would be up "soon".

    Now, they are actually reducing the size of the archive! The one good thing about this, I guess, is that certain crap I posted in younger days is no longer publically available. The bad thing, is that there is also a bunch of Usenet stuff from the 1991-3 era that I'd really like to get my hands on. (Old alt.tasteless!)

    The sad thing is that DejaNews was always the big guy, and because of their presence, nobody else got into the biz, or bothered put those old archives up on-line. (If I'm wrong, please tell me!) Now they are getting out of the Usenet business, we've really lost a valuble resource.


    --
  • This is nearly inevitable. Any single node of control becomes a point that will fail. It may not happen this year, but it will happen.

    Multiple copies via diverse methods is the only nearly secure approach. That way when one of the versions fails, there will probably be time for one of the copies to replicate.

    The problem comes in selecting which information to preserve, since one can't save everything. Probably the best approach is for those who would back up the internet to specialize in certain areas. Remember that nobody can learn everything anymore. Well, nobody can store everything either. Other groups need to take the place of public libraries, and index the sources of information in various ways. It won't be as easy as it used to be, but consider the rate at which new information is being created.

  • Is there a DVD-R set that contains the entire usenet archives? usenet is distributed - is there any node that has been archiving since the start?

    Thanks!
  • by Anonymous Coward
    I just want to say that the old DejaNews postings where a gold mine of useful information. I have worked at a major ISP and answers I found on DejaNews have saved my ass many times. The connectivity, e-mail and DNS of tens of thousands of users has been affected because of this. Imagine this being replicated hundreds or even thousands of times a day around the globe. DejaNews improved the fabric of the internet. The loss of the older posts is a major blow to the internet and needs to be fixed ASAP.
  • >So I would wait for a few more weeks, and see if the situation improves.

    About a year ago, they moved all of the older posts to an inobvious location; in other words, first you had to do a search in the current archive, then you could do a search in the older archive.

    Then a few months ago came the announced re-organization. So far, all they have told us is the same promise that the older archives should be back online.

    Deja has been stupid about this. Right now, they are -- or were -- the only people who offered this service. They could have offered the older posts as a premium service -- maybe charged to the user by charging ISPs. Instead they apparently decided this unique service was not worth offering, & have discontinued it.

    That would be typical yuppie dot-com thinking. I'd like to think that Deja didn't subscribe to that kind of thinking.

    Geoff
  • What do you mean, "Off topic"? We're talking about USENET search engine problems, right?
  • I'm busy nuking all my posts from Deja because of the ad issue

    How do you do this, please?
  • I think one of the problems that usenet is having is that it is becoming fragmented.. Many companies have decided to host their own newsgroups on their own servers and not share them with the rest of the world. (for example, borland.* newsgroups come to mind)

    If a new search engine would allow me to search these "private usenets" as well, I would definitely use it over deja! Especially since I can't access NNTP from work because of our firewall and well, I fully agree that deja's new focus on reviews is stupid. They should've spawned a new site for reviews.
  • There was discussion of this on the NeXTSTEP newsgroups just a few weeks ago. So I'd say a lot of people are unhappy with the loss of a resource (deja) to which they've grown to like.
  • ...let someone else step up to the plate.

    It isn't that dejanews somehow owns the exclusive rights to usenet archives. Anyone has as much rights to do what dejanews did. (Whether they actually have the right to do what they do is questioned by some groups of people). I think dejanews provided a great service, and I am utterly disappointed by their recent policy changes. However, dejanews doesn't own me anything. It's easy to critize dejanews, and suggest what they should do. But before dejanews, there weren't any public Usenet archives at all, and dejanews has been one of the few successful and original ideas on the web that hasn't been copied a thousand times over.

    It's a pity that dejanews doesn't provide as much service as they used to. But it isn't dejanews fault noone else does either.

    -- Abigail

  • The major distinction Usenet enjoys over the Web is that there is all kind of technical, casual, serious, and strange conversation that happens in discussion that not many would bother to write down in a formal web page.

    The biggest advantage Dejanews has in dealing with all this Usenet data is its ability to sort by subject, newsgroup, date, author, and so on. One can't do this with web pages, either. With Deja, you can be really imaginative, trying to recreate in your mind how someone might have phrased a particular question or who might have been interested in a certain topic which you want information about. This is why Dogpile (which another poster mentioned), which does not offer such options, is inferior.

    Deja hosts chats about obscure technical questions, breakfasts in Pittsburgh, debates about graduate schools--thousands of real communities, and opinions, categorized to the tiniest niches, so nicely sorted and searchable, and which can be captured through time. What a sociological resource, if nothing else!

    Usenet is a database, and Deja provides a proper search feature for it. That's value. Great value. What Deja refuses to realize is that it could charge for the resource. I would pay $20 a month for it, easily. Especially if Deja promised to maintain it well (and of course put the old archives back online).

    Deja could add even more value to it. If it had been a little more ambitious, it would have added to its Usenet database discussion forums akin to ForumOne's [forumone.com]. These are Web discussion forums: Salon, the Utne Reader, etc. Searching each one alone for a topic of interest is arduous, since each forum's population can be comparatively small and topics broad. Usenet archives are useful because they provide enough of a range for niche subjects to be covered.

    ForumOne, though in concept magnificent, is practically useless because you cannot use boolean and you cannot sort by author, date, etc. But can you imagine being able to search the entire range of discussion on the web PLUS Usenet with one search engine, and being able to sort via useful database fields? It would be a treasure trove easily equal to the Web in value.

    Deja could charge even more for that.

  • That's what X-NO-ARCHIVE: YES is for.

    That doesn't cut it. You can't copy books and distribute the copies and get away with it because you won't copy books that have "X-NO-ARCHIVE: YES" written in crayon on the cover.

    Whether Usenet archives are violating copyright or not, I don't know. (I hope they aren't, but I can see the reservations people have), but the respecting an opt-out header isn't going to cut it. One does not have to defend a copyright. If you want to legalize it with headers, only opt-in headers will do.

    -- Abigail

  • I think Deja does have a responsibility to carry the archives (or at least store them). Before Dejanews, many groups were archived by individual sites or on behalf of specific groups. Dejanews appeared to offer a reliable replacement for these piecemeal systems, and when administrators moved on or changed policy, the groups depended on dejanews to take over. If Dejanews hadn't existed, or had published today's policy then, other arrangments would have been made : it's deja's existence that has reduced the keeping of other archives.
  • that older traffic only accounts for 10% or so of their bandwidth.

    The more important question is, "How much revenue does that older traffic generate ?". Deja has turned far more commercial of late, the shopping review emphasis, the embedded advert links. We, the geeks who used it for Usenet searches, just aren't a useful revenue stream to them, so they've dumped us.

    Face it guys, Deja is no longer your handy geek-friendly Usenet archive. It's now a "What HomeVideoTheatre Pork-Rind-O-Rama" review site, selling dumbed-down content and adverts to the stupids.

    Time for pastures new. Maybe Dogpile.

  • How about charging for subscriptions? The old deja.com was worth about up to about $10,000 per year to me. I would easily pay $2,000 per year, which is how much I pay for MSDN Universal. If we go with the $2,000 figure, I would say $1500 is for the past year and $500 for the older archives. Perhaps not coincidentally, $2000 is also about my yearly book budget. So perhaps to find the potential market size for a deja.com replacment, find the number of people who last year both bought an O'Reilly book and spent at least $400 in books, journals, and memberships; then take 25% of the combined dollars spent on books, journals, and memberships.
  • If you are looking for old porn and warez from years back, this would be helpful. News is slowly dieing, (un)fortunately portals, like /. are becoming more the norm.
  • actually, the 10% number was pretty good. It meant that only 10% of the searches (not searches from 10% of the people) even hit the old archives. You can't really turn this around the way you did. It didn't say that only 10% of their users used the old archives, it's 10% of their searches.

    And where did you hear that they dropped 60% of their traffic? This is news to me and sounds like pure conjecture (did you konw that 92% of all statistics are made up?)

    And I hope someone does buy a copy of the archives in the fire sale, but they also will have to buy a couple of key people in order to get the info out of the format that it's stored in. That's part of the reason that they took those machines offline. Their current (proprietary) solution doesn't scale well.

  • That group of old servers (the ones that had the old news) were also the machines that were failing more often (and weren't rackmount). When they relocated, they saw how much the new machines were going to cost them in rackspace and then thought about how much they were costing them in support costs, and decided to take them offline until the news retreival system could be re-architected. They still have all the old servers, and they haven't scrapped any of the news. It's just not searchable.

    So did the cost of bandwidth and storage outweigh
    revenue? I'd say definately yes. The intangibles? Don't know. measure them and I'll tell you.

  • Someone approched deja asking for a feed and offered a partnership on the deal. They paid for the (mostly) spamfiltered feed, and charged customers. So any billing problems they had were not Deja's problems.

    It lasted less than a year, and then the company that provided it (or maybe just the department and the service) got bought by usenetservers.com.
  • does anyone have any info on what a full usenet feed (with binaries) is running now? When I left Deja (April) we were getting 70+GB a day. But I don't think that was all the binaries.

    Just curious
  • But even for web searches, I'm coming to the conclusion that Altavista is just sucky. Firstly, it's way out of date, half the links are broken and it's still indexing my homepage as having content that was changed over two months ago. But mostly I get annoyed that the search page refreshes itself every 5 minutes.

    I don't think Altavista will autorefresh if you use it in text mode [altavista.com]. It doesn't seem to send any meta-refresh tags that way, plus it's a much faster loading, easier to read interface. Text mode used to not even have banner ads, but Junkbuster takes care of those easily enough.

  • The Web killed Usenet? No, Usenet killed itself.

    I've only been on the internet since 1989, but I, too, used to read Usenet. I haven't for several years. Why? The sheer volume of it makes it too difficult to keep up with.

    As you point out, Usenet used to be the default place for discussion. But that in itself--not the rise of other discussion forums--led to the current state, where a much smaller fraction of people on the internet use Usenet for discussion. As the volume of Usenet users grew, newsgroups simply became too big for the average person to follow. Face it, if you want to discuss the latest episode of your favorite TV show, would you rather do it in a group of twenty people, or in a group of a thousand? Sure, the larger group will be more diverse and the best comments will be more insightful than in the smaller group, but that's a secondary consideration compared to the time it takes to follow the group.

    It's almost paradoxical--fewer people use Usenet now because so many people use it that it's inefficient. I imagine a much smaller percentage of people use it than did 10 years ago, while the absolute numbers who use it are still growing.

    The rise of web-based and email-based discussion forums are a result of the death of usenet, not its cause. It's a far from ideal solution. The implicit goal is to cut the time it takes to follow a discussion to a reasonable amount. The solution is to create smaller groups of people simply on the basis that not everybody knows about the forum. (Not that this was intentional, mind you--I'm not saying anyone ever sat down and said, "I'll create a web-based forum so that only a handful of people will know about it and thus discussion will occur at a manageable level, despite the fact that I'm not going to actively exclude anyone." It's more of a Natural Selection sort of pressure.) But for the average user, it's better than Usenet.

"The Street finds its own uses for technology." -- William Gibson

Working...