Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
News

Language Parsing and AI-Where are we now? 13

C-Town asks: "Browsing through the Slashdot articles to research Artificial Intelligence, the Artificial Intelligence IRC Bot contest came up and it brought me to think.. How far have we gotten in AI technology in terms of language/text parsing? How long will it be before Ask Jeeves will start working so well that it could replace all pattern matching and neural networking based searches? Instead of giving me categories when I submit a query, return to me abstracts (not hand entered but abstracts generated by the ai engine, that describes the article in context of the search query, when the match is made) of the documents on the web with their respective links? I know Autonomy currenlty have a product out to improve data mining and they route customer service emails with what they call "high performance pattern matching algorithms" (I think it's neural networking?) but they're still not able to analyze whole documents in a lexical manner. What companies/research institutes are currently working on this? Imagine being able to search through all /. comments with a question like "Who makes the best Linux Laptops" and get great results. I can imagine this to stop spam eventually too! "
This discussion has been archived. No new comments can be posted.

Language Parsing and AI-Where are we now?

Comments Filter:
  • Nothing against Lojban, but I'm side with the school of thought that says computers should be changed to better interact with humans, and not vice versa. Human language was established thousands of years ago, and electronic computing is circa 50 years old; it'll cause significantly less social upheaval to improve computer language recognition than it will to train the world to speak Lojban.

    Plus, I kinda *like* the fact that language is not logical. A reflection of the beings who speak it, I guess...
  • Language processing really isn't that hard. Look at some of the people who do it. That may sound flippant but it obviously doesn't consume fast portions of normal people's brains to do it. Sure I slow down talking when dealing with complex driving problems but compare it to relatively basic mathematics which I can't do while driving (I mean more complex than long division.)

    I disagree on this point. I'm sure that if you were to go back to Hyde Park and asking people in Cummings or BSLC they would disagree about how much mental ability parsing takes. I think the reason that parsing seems easy and math problems seem to be difficult is because our brains have evolved to deal with parsing in an efficient manner.

    For a given person not being able to divide a 3 digit number by a 2 digit number quickly isn't much of a handicap, but not being able to speak and understand speech quickly and easily is a big problem. In our past, being able to communicate without effort would be more advantageous if you were hunting something than being able to do complex math problems.

    Given an appropriate setting and some help, I could, by following several psycholinguistics methods that I worked on at the University of Chicago and access to an online thesarus based database, get working mechanical translation going within six months. Problem is I don't do generative linguistics and all of Noam Chomsky's followers would rather not have a working system than see his theories disproven.

    I'm sure that others have tried using other methods to do translations/parsing but so far it hasn't been very successful. To tell the truth, using a generative grammer is probably the best method available in computer science. It has a solid theoretical/mathematical framework and its problems/benefits are relatively well understood. In any case, I don't believe that language semantics are well enough understood/adequately modeled in linguistics that a mechanical translation system would be possible right now (e.g. idioms and cultural references that cause problems for professional translators).

  • Now here, I absolutely disagree with you. Our brains simply have not had enough time to develop for efficient language parsing. At best, language parsing is an 'arch'. Some of the features of phonetics have had the time (VOT for example) but not language parsing in general.

    I think that you are underestimating the amount of time that we have had to evolve parsing. I see parsing as a specialized case of communication. Other animals are able to communicate and seem to have the ability to use language in a limited capacity (e.g. Koko, some apes). In more primitive forms, many mammals are able to communicate in some form. These communications are structured. So I would consider our language abilities as a specialization of these communication abilities. If you buy this, then we have had quite a while to evolve structures to deal with parsing in some form.

    Precisely what I'm attempting to get at. A generative grammar tells you only all the possibilities (at best) but never any of the probabilities. A prototype grammar, one appearing in the early 80s called the sausage machine, would provide you with the main possibilities and all the probabilities. This would be much more useful in attempting mechanical translation.

    Using a stochastic context free grammer lets you assign probabilities to production rules in the grammar. More generally you can assign probabilities to production rules in a context sensitive grammer to get a sense a of how often a given production is used.

    And you've shown yourself to have bought Chomsky's line hook line and sinker by arguing this point. Language is not a mathematical construct, it is a psychological one.

    Whether language is a mathematical or psychological construct does not really matter much to modeling it mathematically. The model may not work well but you can still do it. The advantages of using a generative are that it is mathematically well understood and has a solid interconnections with complexity theory and algorithmics. You can get good idea of how an algorithm based on this model will run and where it will have problems. Other models may not have these advantages.

    In addition, using a computer to parse language means that ultimately you need to express the parsing in an algorithmic procedure. Regardless of whether language is a psychological construct or not, you will express it in terms of if/then or case statements. My assertion is that you can change these statements into a generative grammar with associated probabilities.

    In any case, I disagree with your position that language is entirely psychological. I think that there are some aspects that are purely mathematical. For example, most people are to identify the syntactic elements in a nonsense sentence such as "The sdfjklds aaadjed fdfjdfj to fdjlfkdj." I would think a psychological model of language would have problems with this and the ability of sentences/phrases to be perfectly correct while having no mean or a contradictory meaning.

    By the way, who moderated you up?????

    I get an automatic +1 bonus do to positive moderation in the past. If you get more than 20 something points of positive moderation, your posts receive a +1 bonus unless you explicitly prevent it.

  • Language processing really isn't that hard. Look at some of the people who do it. That may sound flippant but it obviously doesn't consume fast portions of normal people's brains to do it. Sure I slow down talking when dealing with complex driving problems but compare it to relatively basic mathematics which I can't do while driving (I mean more complex than long division.) Given an appropriate setting and some help, I could, by following several psycholinguistics methods that I worked on at the University of Chicago and access to an online thesarus based database, get working mechanical translation going within six months. Problem is I don't do generative linguistics and all of Noam Chomsky's followers would rather not have a working system than see his theories disproven.
  • I think the reason that parsing seems easy and math problems seem to be difficult is because our brains have evolved to deal
    with parsing in an efficient manner.

    Now here, I absolutely disagree with you. Our brains simply have not had enough time to develop for efficient language parsing. At best, language parsing is an 'arch'. Some of the features of phonetics have had the time (VOT for example) but not language parsing in general.

    To tell the truth, using a generative
    grammer is probably the best method available in computer science.

    Precisely what I'm attempting to get at. A generative grammar tells you only all the possibilities (at best) but never any of the probabilities. A prototype grammar, one appearing in the early 80s called the sausage machine, would provide you with the main possibilities and all the probabilities. This would be much more useful in attempting mechanical translation.

    It has a solid theoretical/mathematical framework . . .

    And you've shown yourself to have bought Chomsky's line hook line and sinker by arguing this point. Language is not a mathematical construct, it is a psychological one.

    I'm sure that others have tried using other methods to do translations/parsing but so far it hasn't been very successful.

    I did a masters thesis explicity trying to find any such effort. The idioms and cultural references problems are not as difficult as you think when approached via a thesarus method system.

    By the way, who moderated you up?????

  • I think that you are underestimating the amount of time that we have had to evolve parsing. I see parsing as a specialized case of communication. Other
    animals are able to communicate and seem to have the ability to use language in a limited capacity (e.g. Koko, some apes). In more primitive forms, many
    mammals are able to communicate in some form. These communications are structured. So I would consider our language abilities as a specialization of
    these communication abilities. If you buy this, then we have had quite a while to evolve structures to deal with parsing in some form.

    Yes, but if you believe that then you have invalidated your original arguement that mathematics is less automated. Animals can count (to some degree) and we have had just as long to evolve those abilities.

    Using a stochastic context free grammer lets you assign probabilities to production rules in the grammar. More generally you can assign probabilities to
    production rules in a context sensitive grammer to get a sense a of how often a given production is used.

    When Prof Goldsmith presented that he was laughed off the stage. It's a poor approximation at best.

    For
    example, most people are to identify the syntactic elements in a nonsense sentence such as "The sdfjklds aaadjed fdfjdfj to fdjlfkdj." I would think a
    psychological model of language would have problems with this and the ability of sentences/phrases to be perfectly correct while having no mean or a
    contradictory meaning.

    That is precisely the strength of a psychological model. The syntatic words match a pattern so you can identify the subject, predicate . . .

    In addition, using a computer to parse language means that ultimately you need to express the parsing in an algorithmic procedure. Regardless of whether
    language is a psychological construct or not, you will express it in terms of if/then or case statements. My assertion is that you can change these statements
    into a generative grammar with associated probabilities.

    A point I do not deny. However, I prefer to use a fuzzy logic algorithm than the conventional predicate logic one. Then, this does not yield a generative grammar.

  • One of the main problems with doing natural language processing is that linguistics is about as old as computer science. And while CS has had a pretty firm theoretical foundation since Day One (thank you, Mr. Turing), debate rages on to this day as to how language really works. So not only is the domain quite complex (ie. you need damn good programmers who have good knowledge of linguistics), no one even completely understands the domain.

    That's not gonna' stop me from goin' into the field, though...

    Jon
  • During the 80's natural language understanding and automatic translation was very much en vogue. But since noone was able to make anything terribly useful, much of the research has stopped and is now at a standstill. What is really missing is a theoretical breakthrough. This is obviously unpredictable, but might happen with the current progress in genetics. The more we know about the human genome, the more we will understand the human brain. And from there, the world is your oister. So be patient!

  • If everybody started using Lojban [lojban.org], the problem would be orders of magnitude simpler. But English, though quite natural for its speakers, has brain-dead syntax when viewed from a parsing standpoint, so much so that even parsing Latin would be easier (a problem I am actually thinking of tackling).

  • Well considering that the computers have a long way to go, why not meet them half-way? On the other hand, there is no half-way point that I can think of. Maybe a regular language like Lojban would be a good place to start, and then the programmers could concentrate on extracting the meaning (semantics) from the words (syntax). Eventually, when computer language processing of a simple language like Lojban, the parsing of English could be made easier by dictating the solution in this simpler but spoken language. Depending on the sophistication of the program, it might be possible to extend the parser just a little bit and then have the computer "learn" the differences between English and whatever language it is trained in a step at a time until it builds a good understanding of the entire language. This might seem overly complicated from a programming standpoint but in the future, extracting key words from sentences like Jeeves does just won't cut it.

  • To think about the question, you might want to consider a few issues:

    1. Language is AI complete.

    There's a common naive assumption that you can analyze a text in isolation and figure out what it means based on the definitions of words and the structures of language. But that forgets about the tremendous amount of knowledge that you bring to the text before you read it.

    Forget Littleton, the real problem is 1984.

    Think about what you have to know to understand that sentence. It's not just a simple matter of encoding more definitions, either.

    The refrigerator slipped and he jerked his foot too late.

    You know what happened, but only if you know something about gravity and where feet go, which isn't mentioned anywhere in the sentence... In fact, it's estimated that 3 year olds know some 50,000 facts about the way the world works physically. Naturally, language is designed to for such an environment. Without the complete understanding of intelligence as practiced by people, it's unlikely that you'll have much success writing code to understand human language.

    2. There are different kinds of knowledge.

    When you talk about knowledge, you often mean "facts." But there are other kinds of knowledge not so easily described. Playing music, for example, involves kinds of knowledge that skilled artists may not be able to describe, although it's obvious that they have it.

    3. There are different kinds of understanding.

    When you ask about the state of language understanding, realize that there are different kinds of understanding. Answering questions about characters in a story may be a very different task from deciding the grade level of the writer, for example. You could probably sort advertisements in any language, so your understanding is based on other cues.

    4. Languages depend on domain.

    It's also important to realize that progress will probably be made in limited domains. Typically, email uses only a few thousand words (~22,000 in my research) and covers a small range of topics. Success in an email sorting task that recognizes and discards spam can be said to represent some kind of understanding, though its considerably more limited than a human's reading and understanding of the text.

  • Non-natural languages such as lojban are notoriously problematic. There are reasons why natural languages don't have only 850 words, and why they aren't comprised of musical notes (I'm not kidding, one constructed language was actually meant to be sung only, and its writing system was using musical notes), and why adjectives have negatives other than just the original adjective with an "un" in front of them. For reasons other than their obvious shortcomings stemming from small word-sets (reasons dealing with our innate language capabilities that I'm sure you don't want me to go into here), no constructed language has ever been--and probably none will ever be--even close to natural languages in terms of our brains' abilities to process them. Also, constructed languages are notoriously limited in terms of language change. Languages are, by their very nature, changing beings, and trying to lock them into a singular state is pure stupidity.
    I don't mean this as a flame or anything of the sort, it is just that I'm getting a bit sick of seeing people suggest that man-made languages replace (or even become lingua francas) natural languages so that everything would be "easier." It's just not even a remote possibility, and aside from hobbyists who enjoy learning Esperanto in their spare time (as quite a few of my colleagues do), there's just no useful application of them. Unfortunately the idea of a world-wide universal language that is man-made is pure fantasy. World-wide lingua francas based on natural languages is much more realistic, although I hope that day never comes--it would seem quite a boring world to me if everyone spoke the same language. But that's probably because my job would be a lot less interesting then.
  • Just a quick note, you mention that language is not logical. It's true that some languages don't follow formal logic (e.g., it's perfectly okay to use double-negatives in French), but formal logic is different from "logical" in another sense. The sense I'm talking about is that it's logical in terms of being in perfect harmony with the equipment that's generating it as well as with the equipment that's receiving it. Some of the things that popular critics lambast (e.g., the redundancy and strangeness of the English spelling system) are perfectly logical and make perfect sense when you analyze them for what they are, and for their use by the systems that use them and receive them. I won't go into details here, but there is plenty of reason for redundancy and "strnageness" in spelling, etc. Steven Pinker's The Language Instinct is a great introduction to a lot of this stuff that I highly reccomend.

Old programmers never die, they just become managers.

Working...