Language Parsing and AI-Where are we now? 13
C-Town asks: "Browsing through the Slashdot articles to research Artificial Intelligence, the Artificial Intelligence IRC Bot contest came up and it brought me to think.. How far have we gotten in AI technology in terms of language/text parsing? How long will it be before Ask Jeeves will start working so well that it could replace all pattern matching and neural networking based searches? Instead of giving me categories when I submit a query, return to me abstracts (not hand entered but abstracts generated by the ai engine, that describes the article in context of the search query, when the match is made) of the documents on the web with their respective links? I know Autonomy currenlty have a product out to improve data mining and they route customer service emails with what they call "high performance pattern matching algorithms" (I think it's neural networking?) but they're still not able to analyze whole documents in a lexical manner. What companies/research institutes are currently working on this? Imagine being able to search through all /. comments with a question like "Who makes the best Linux Laptops" and get great results. I can imagine this to stop spam eventually too! "
Re:Depends on your language... (Score:1)
Plus, I kinda *like* the fact that language is not logical. A reflection of the beings who speak it, I guess...
Re:Not really that hard (Score:2)
I disagree on this point. I'm sure that if you were to go back to Hyde Park and asking people in Cummings or BSLC they would disagree about how much mental ability parsing takes. I think the reason that parsing seems easy and math problems seem to be difficult is because our brains have evolved to deal with parsing in an efficient manner.
For a given person not being able to divide a 3 digit number by a 2 digit number quickly isn't much of a handicap, but not being able to speak and understand speech quickly and easily is a big problem. In our past, being able to communicate without effort would be more advantageous if you were hunting something than being able to do complex math problems.
Given an appropriate setting and some help, I could, by following several psycholinguistics methods that I worked on at the University of Chicago and access to an online thesarus based database, get working mechanical translation going within six months. Problem is I don't do generative linguistics and all of Noam Chomsky's followers would rather not have a working system than see his theories disproven.I'm sure that others have tried using other methods to do translations/parsing but so far it hasn't been very successful. To tell the truth, using a generative grammer is probably the best method available in computer science. It has a solid theoretical/mathematical framework and its problems/benefits are relatively well understood. In any case, I don't believe that language semantics are well enough understood/adequately modeled in linguistics that a mechanical translation system would be possible right now (e.g. idioms and cultural references that cause problems for professional translators).
Re:Not really that hard (Score:2)
I think that you are underestimating the amount of time that we have had to evolve parsing. I see parsing as a specialized case of communication. Other animals are able to communicate and seem to have the ability to use language in a limited capacity (e.g. Koko, some apes). In more primitive forms, many mammals are able to communicate in some form. These communications are structured. So I would consider our language abilities as a specialization of these communication abilities. If you buy this, then we have had quite a while to evolve structures to deal with parsing in some form.
Precisely what I'm attempting to get at. A generative grammar tells you only all the possibilities (at best) but never any of the probabilities. A prototype grammar, one appearing in the early 80s called the sausage machine, would provide you with the main possibilities and all the probabilities. This would be much more useful in attempting mechanical translation.Using a stochastic context free grammer lets you assign probabilities to production rules in the grammar. More generally you can assign probabilities to production rules in a context sensitive grammer to get a sense a of how often a given production is used.
And you've shown yourself to have bought Chomsky's line hook line and sinker by arguing this point. Language is not a mathematical construct, it is a psychological one.Whether language is a mathematical or psychological construct does not really matter much to modeling it mathematically. The model may not work well but you can still do it. The advantages of using a generative are that it is mathematically well understood and has a solid interconnections with complexity theory and algorithmics. You can get good idea of how an algorithm based on this model will run and where it will have problems. Other models may not have these advantages.
In addition, using a computer to parse language means that ultimately you need to express the parsing in an algorithmic procedure. Regardless of whether language is a psychological construct or not, you will express it in terms of if/then or case statements. My assertion is that you can change these statements into a generative grammar with associated probabilities.
In any case, I disagree with your position that language is entirely psychological. I think that there are some aspects that are purely mathematical. For example, most people are to identify the syntactic elements in a nonsense sentence such as "The sdfjklds aaadjed fdfjdfj to fdjlfkdj." I would think a psychological model of language would have problems with this and the ability of sentences/phrases to be perfectly correct while having no mean or a contradictory meaning.
By the way, who moderated you up?????I get an automatic +1 bonus do to positive moderation in the past. If you get more than 20 something points of positive moderation, your posts receive a +1 bonus unless you explicitly prevent it.
Not really that hard (Score:1)
Re:Not really that hard (Score:1)
with parsing in an efficient manner.
Now here, I absolutely disagree with you. Our brains simply have not had enough time to develop for efficient language parsing. At best, language parsing is an 'arch'. Some of the features of phonetics have had the time (VOT for example) but not language parsing in general.
To tell the truth, using a generative
grammer is probably the best method available in computer science.
Precisely what I'm attempting to get at. A generative grammar tells you only all the possibilities (at best) but never any of the probabilities. A prototype grammar, one appearing in the early 80s called the sausage machine, would provide you with the main possibilities and all the probabilities. This would be much more useful in attempting mechanical translation.
It has a solid theoretical/mathematical framework . . .
And you've shown yourself to have bought Chomsky's line hook line and sinker by arguing this point. Language is not a mathematical construct, it is a psychological one.
I'm sure that others have tried using other methods to do translations/parsing but so far it hasn't been very successful.
I did a masters thesis explicity trying to find any such effort. The idioms and cultural references problems are not as difficult as you think when approached via a thesarus method system.
By the way, who moderated you up?????
Re:Not really that hard (Score:1)
I think that you are underestimating the amount of time that we have had to evolve parsing. I see parsing as a specialized case of communication. Other
animals are able to communicate and seem to have the ability to use language in a limited capacity (e.g. Koko, some apes). In more primitive forms, many
mammals are able to communicate in some form. These communications are structured. So I would consider our language abilities as a specialization of
these communication abilities. If you buy this, then we have had quite a while to evolve structures to deal with parsing in some form.
Yes, but if you believe that then you have invalidated your original arguement that mathematics is less automated. Animals can count (to some degree) and we have had just as long to evolve those abilities.
Using a stochastic context free grammer lets you assign probabilities to production rules in the grammar. More generally you can assign probabilities to
production rules in a context sensitive grammer to get a sense a of how often a given production is used.
When Prof Goldsmith presented that he was laughed off the stage. It's a poor approximation at best.
For
example, most people are to identify the syntactic elements in a nonsense sentence such as "The sdfjklds aaadjed fdfjdfj to fdjlfkdj." I would think a
psychological model of language would have problems with this and the ability of sentences/phrases to be perfectly correct while having no mean or a
contradictory meaning.
That is precisely the strength of a psychological model. The syntatic words match a pattern so you can identify the subject, predicate . . .
In addition, using a computer to parse language means that ultimately you need to express the parsing in an algorithmic procedure. Regardless of whether
language is a psychological construct or not, you will express it in terms of if/then or case statements. My assertion is that you can change these statements
into a generative grammar with associated probabilities.
A point I do not deny. However, I prefer to use a fuzzy logic algorithm than the conventional predicate logic one. Then, this does not yield a generative grammar.
Problems with Parsing (Score:1)
That's not gonna' stop me from goin' into the field, though...
Jon
Breakthrough is missing (Score:2)
Depends on your language... (Score:1)
Re:Depends on your language... (Score:1)
Understanding the question... (Score:1)
1. Language is AI complete.
There's a common naive assumption that you can analyze a text in isolation and figure out what it means based on the definitions of words and the structures of language. But that forgets about the tremendous amount of knowledge that you bring to the text before you read it.
Forget Littleton, the real problem is 1984.
Think about what you have to know to understand that sentence. It's not just a simple matter of encoding more definitions, either.
The refrigerator slipped and he jerked his foot too late.
You know what happened, but only if you know something about gravity and where feet go, which isn't mentioned anywhere in the sentence... In fact, it's estimated that 3 year olds know some 50,000 facts about the way the world works physically. Naturally, language is designed to for such an environment. Without the complete understanding of intelligence as practiced by people, it's unlikely that you'll have much success writing code to understand human language.
2. There are different kinds of knowledge.
When you talk about knowledge, you often mean "facts." But there are other kinds of knowledge not so easily described. Playing music, for example, involves kinds of knowledge that skilled artists may not be able to describe, although it's obvious that they have it.
3. There are different kinds of understanding.
When you ask about the state of language understanding, realize that there are different kinds of understanding. Answering questions about characters in a story may be a very different task from deciding the grade level of the writer, for example. You could probably sort advertisements in any language, so your understanding is based on other cues.
4. Languages depend on domain.
It's also important to realize that progress will probably be made in limited domains. Typically, email uses only a few thousand words (~22,000 in my research) and covers a small range of topics. Success in an email sorting task that recognizes and discards spam can be said to represent some kind of understanding, though its considerably more limited than a human's reading and understanding of the text.
Re:Depends on your language... (Score:2)
I don't mean this as a flame or anything of the sort, it is just that I'm getting a bit sick of seeing people suggest that man-made languages replace (or even become lingua francas) natural languages so that everything would be "easier." It's just not even a remote possibility, and aside from hobbyists who enjoy learning Esperanto in their spare time (as quite a few of my colleagues do), there's just no useful application of them. Unfortunately the idea of a world-wide universal language that is man-made is pure fantasy. World-wide lingua francas based on natural languages is much more realistic, although I hope that day never comes--it would seem quite a boring world to me if everyone spoke the same language. But that's probably because my job would be a lot less interesting then.
Re:Depends on your language... (Score:2)