Open Source Natural Language Processing?

Open Source Natural Language Processing? 31

Posted by Cliff on Monday November 18, 2002 @08:09AM from the teaching-computers-about-speech dept.

fieldmethods asks: "One area where Open Source and Free Software doesn't seem to have really taken off is Natural Language Processing (using computers to deal with human languages). There are a few projects that are open source, such as Festival (a speech synth system, now ported to Java), NLTK, a general-purpose NLP system in Python, and the Linguana project, a Perl implementation of a semantic network not unlike Wordnet (but better). Generally, though, there doesn't seem to be a lot of Open Source momentum behind the field as a whole. It's a challenging, difficult field that would benefit from collaboration, especially given the potential of replacing static corpora with on-the-fly corpora developed by search engines. Is anybody else interested in this?"

Open Source Natural Language Processing?

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 31 Comments Log In/Create an Account

Comments Filter:

- Re:Doesn't look like it! (Score:2, Informative)
  
  by Some Guy ( 21271 ) writes:
  
  Actually, I am interested in this. I did some computational linuguistics work while I was doing my BSc/MSc and really enjoyed it.
  
  You should also have mentioned your [?] interesting website fieldmethods.net [fieldmethods.net] as a good source for exploring all things NLP [which I thought referred to Neuro-Linguisitc Programming [neurolingu...amming.com] when I first saw it...].
  - Re:Doesn't look like it! (Score:1)
    
    by fieldmethods ( 620984 ) writes:
    
    Neuro-Linguistic Programming is really different. It's actually something of a sore spot, I'd say, within the Natural Language Processing world, that this other field, which overlaps only under the most generous of definitions, has swiped some of the mindshare of the acronym. To be honest, the NeuroLinguistic Programming thing seems a little iffy to me. But I'll withhold judgement. The bottom line is that its goals are quite different from those of Computational Linguistics. (Also: I'm glad you think the site is interesting. It is my site, but I didn't want to shamelessly promote it.)
It depends on what you call NLP (Score:4, Funny)

by Boglin ( 517490 ) writes: on Monday November 18, 2002 @09:58AM (#4696309) Journal

If machines that attempt the Turing Test count as NLP, then NLP is a solved problem. You just need a random number generator to choose from a list of prechosen responses (face it, there's nothing less believable than talking to someone who actually listens to you.) Therefore, I submit Virtual Boglin:

#include

void main() {
int i = 1;

printf("Hello\n");
while(i) {
scanf();
switch(i){
case 1: printf("Microsoft Sucks! Use Linux!\n");break;
case 2: printf("I need to boot back over to the Windows side to play System Shock 2.\n");break;
case 3: printf("Sony is an evil monster who won't be content until we have lost all our rights.\n");break;
case 4: printf("Have I shown you my Clie? Look, it can play the Spiderman Trailer!\n");
}
i=rand()%5;
}
printf("Leave me alone; I'm about to get a new high score.\n");
}

- Re:It depends on what you call NLP (Score:1)
  
  by fieldmethods ( 620984 ) writes:
  
  You're absolutely right about the importance of definitions. Your code is funny (and it's not too far from what Eliza does, for instance), but actually, it's kind of surprising how "sensible" random content can seem if it's taken from a large enough database (like the internet [coredump.cx].)
Did you google? (Score:4, Informative)

by FeatureBug ( 158235 ) writes: on Monday November 18, 2002 @10:20AM (#4696442)
Is there a reason you haven't tried answering your question using Google? You're not chasing karma are you? Last I heard Google is free.
There's a huge amount of open-source NLP resources and software for many languages on the web.
- Thought Treasure [signiform.com] - a bilingual database of 25000 concepts including 55000 English and French words and phrases. Compiles under Linux and Windows
- Leeds University NLP Research Group [leeds.ac.uk] (mail webmaster re broken links to software)
- Japan's ICOT/5th Generation Computer Project archive of free software [icot.or.jp]
- Linguistics Toolset at Vaasa University [uwasa.fi]
Last but not least:
- A well-annotated collection of NLP links [tokushima-u.ac.jp] including NLP, NLU, Speech*, MLT, Fuzzy*, MLPs, SVMs, etc
Will.
- Re:Did you google? (Score:1)
  
  by fieldmethods ( 620984 ) writes:
  
  Suuuure I Google. I Google a LOT. About NLP, in fact. Maybe I didn't phrase the question so clearly: it's just my impression that your average opensource hacker isn't interested in NLP. I don't know whether this is because they aren't interseted in it, think it's boring, or what. That's what I was wondering. And karma? Eh? What's that?
Linguana (Score:1)

by B.Smitty ( 604089 ) writes:

How is Linguana better than WordNet when they haven't actually done anything?
Perhaps... (Score:2)

by Call Me Black Cloud ( 616282 ) writes:

...the people that are knowledgeable in this field enjoy getting paid for their work.
two projects (Score:3, Informative)

by Kunta Kinte ( 323399 ) writes: on Monday November 18, 2002 @12:15PM (#4697465) Journal

http://freespeech.sourceforge.net/ [sourceforge.net]
http://www.speech.cs.cmu.edu/ [cmu.edu]
There are probably others ( search google.com, freshmeat.net, sourceforge.net )

open source NLP (in POESIA) (Score:2, Informative)

by basiles ( 626992 ) writes:

The POESIA (an opensource internet content filter, partly funded by the European Commission, safer Internet Access Plan IAP2117/27572) project will have some opensourced NLP components (for English, Spanish, Italian...).
See POESIA [poesia-filter.org] site for details.
POESIA (Public Opensource Environment for a Safer Internet Access) aims to protect European youth (in educational institutions) against harmful or inappropriate Internet content, and use several techniques (including NLP, Image processing, ...) to achieve this goal.
Lack of interest (Score:1)

by Koos Baster ( 625091 ) writes:

Judging by the responses to this post (or rather the lack thereof), NLP is not a very hot topic. Most of natural language processing research is in a very academic stage. Quite some universities study some NLP related small little subtopic, but there are hardly any real large departments - say the size of a computer science faculty.

With Lernhout & Hauspie - the one major commercial software supplier in this field - gone bankrupt, there are only some small companies, trying to get by. Some have success in a very specialized sub-subject, like OCR, voice response or information retrieval.

As a former Computational Linguistics student, I'd say the main problem is either the lack of computational power or the lack of manual labour. Ie.: even a very well defined liguistic area needs to be defined with too many rules (in a complex system) or needs too much data and CPU time (in a brute force) to be feaible, commercially viable, interesting in the Turing-sense... too much effort to just make it work.

Where you'd expect high-level NLP to work, simpler techniques usually work better. Ask Jeeves [ask.com] and Q-go [q-go.com] are great, but most people agree no search engine currently beats Google [google.com], even when it's taylored for a very small subject. NLP is just way immature, compared to most other computational topics - primarily because it is intrinsically complicated.

I guess we'll have to wait for the killer-app for another decenium or two, though I'm a pessimist. Until that time I'd agree all institutions to collaborate as much as possible, and I really don't understand where some universities are going with their closed source research projects.

--
Recursive: Adj. See Recursive
- Intrinsically complicated (Score:2)
  
  by Tune ( 17738 ) writes:
  
  > [...] - primarily because it is intrinsically complicated.
  
  Yeah right. And databases, math and user interfaces are not?
  - Re:Intrinsically complicated (Score:1)
    
    by Koos Baster ( 625091 ) writes:
    
    And databases, math and [graphical] user interfaces are not?
    
    Indeed. As a language, "math" was designed to exclude many important aspects that evidentally exist in all human languages, like ambiguity, intonation,
    
    Databases and GUI existed before we learned how to use them: we simultaneously discovered the technique and adopted to using it. Language has been with us for (at least) tens of thousands of year. It has evolved and is constantly changing still. There was never a clean design, it was just there for as long as we can remember. Successful human-computer interaction, in the "2001 Space Odyssee" sci-fi sense can only be achieved if we build machines that can reason, handle paradoxes, and the whole trick we currenly call NLP.
    
    ...For now and in the forseable future the only way to build a machine like that is to have children.
  - Re:Intrinsically complicated (Score:1)
    
    by tgv ( 254536 ) writes:
    
    Precisely.
    
    Processing mathematics is easy compared to natural language. There were calculators (think abacus, or Pascal's mechanical calculator) long before anyone made an attempt at implementing language processing. Solving equations on the other hand is difficult, and you will find that (open source) algebraic manipulation programs are much rarer than simple math programs.
    
    Plus, most people who do NLP know quite a lot of math, whereas most people who know math, know very little about NLP. So that explains part of the bias.
    
    Let's not compare the complexity of user interface (design/implementation) with that of a serious NLP application. GUIs are easy. Try Visual Basic if you don't believe me... The programming tools for building GUIs are (of course) more complex, but still straight-forward compared to NLP.
- The Myth of Labor (Score:1)
  
  by candot ( 513284 ) writes:
  
  As a former Computational Linguistics student, I'd say the main problem is either the lack of computational power or the lack of manual labour. Ie.: even a very well defined liguistic area needs to be defined with too many rules (in a complex system) or needs too much data and CPU time (in a brute force) to be feaible, commercially viable, interesting in the Turing-sense... too much effort to just make it work.
  
  This is a very popular opinion, conirmed by the amount of money corporations through at manual processes like taxonomy maintenance and training, but it's out of date. There are several scalable, commercially viable approaches that do not require manual labor or prohivitive processing to be feasible.
  
  Google News [google.com] is an example of a large-scale application built with automated NLP. Think Tank 23 [thinktank23.com] makes a NLP-based, ad-hoc categorization engine that powers, among other sites, the Waypath Project.
  
  Talk about a killer app!
  - Unsuperviced Techniques (Score:1)
    
    by Koos Baster ( 625091 ) writes:
    
    I'm not sure about ThinkTank23, but isn't Google news maintained by human droids? IMHO there's some really interesting NLP research going on at Google, but its all very pragmatic: focus is on assisting manual labor and getting proven, but simple, techniques to work on larger and larger problems.
    
    Nothing wrong with being pragmatic, but in the spectrum of unsupevised-NLP to automatic-aids-for-jobs-that-humans-find-tedious, only the left-most extreme is a cheap solution in the long run. And it's these unsuperviced techniques that just don't take off. Persoanlly, I tend to agree with the popular opinion that this will take a very long time - mainly because NLs are just intrinsically very complex, compared to the problems faced by mainstream computer science.
    
    But I certainly agree that NLP research has contributed to some killer apps.
    
    --
    If pro is the opposite of con, what is the opposite of progress?
- - Re:Lack of interest (Score:1)
    
    by Koos Baster ( 625091 ) writes:
    
    True. But even in MSN8, it's not the "killer app" that boosts NLP into the front page media. Rather, a "geek" feature that many perceive as anoying as "Clippy the happy Word-wizzard" - primarily due to its immaturity.
answer (Score:2)

by tongue ( 30814 ) writes:

" Is anybody else interested in this?"

judging from the lack of comments on this story, i'd say the answer is a resounding "No."
Check out OpenCyc (Score:3, Informative)

by jungd ( 223367 ) writes: on Monday November 18, 2002 @03:03PM (#4699471)

One of the best speech understanding systems in existance is OpenCyc [opencyc.org] - and it is open source!

- Re:Check out OpenCyc (Score:1)
  
  by tgv ( 254536 ) writes:
  
  Sorry, but "speech" and "understanding" are not the words to use to describe (Open)Cyc. If you yourself would have take the trouble to read the page you're referring to, you would have read: "OpenCyc can be used as the basis of a wide variety of intelligent applications such as : * speech understanding, etc."
  
  So it's not NLP, but a module that could be coupled to a system that has an NLP component and needs to do some reasoning over it.
  
  And for this you got a score of 3? Lack of knowledge does clearly not interfere with your karma.
I've got two links for you (Score:3, Informative)

by perkr ( 626584 ) writes: on Monday November 18, 2002 @04:41PM (#4700570)

General open source NLP tools:
http://opennlp.sourceforge.net [sourceforge.net]
http://nlpfarm.sourceforge.net [sourceforge.net]
If you're looking for speech software there isn't that much good software as open source, since just about every aspect of modern speech processing is patented.

Emdros text database engine (Score:1)

by ulrikp ( 64196 ) writes:

Hi,

my emdros [emdros.org] text database engine is built specificially for storing and retrieving annotated or analyzed text. This makes it ideal as a back-end for certain classes of NLP projects.

Ulrik Petersen

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Open Source Natural Language Processing? 31

Open Source Natural Language Processing? More Login

Open Source Natural Language Processing?

Re:Doesn't look like it! (Score:2, Informative)

Re:Doesn't look like it! (Score:1)

It depends on what you call NLP (Score:4, Funny)

Re:It depends on what you call NLP (Score:1)

Did you google? (Score:4, Informative)

Re:Did you google? (Score:1)

Linguana (Score:1)

Perhaps... (Score:2)

two projects (Score:3, Informative)

open source NLP (in POESIA) (Score:2, Informative)

Lack of interest (Score:1)

Intrinsically complicated (Score:2)

Re:Intrinsically complicated (Score:1)

Re:Intrinsically complicated (Score:1)

The Myth of Labor (Score:1)

Unsuperviced Techniques (Score:1)

Re:Lack of interest (Score:1)

answer (Score:2)

Check out OpenCyc (Score:3, Informative)

Re:Check out OpenCyc (Score:1)

I've got two links for you (Score:3, Informative)

Emdros text database engine (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot