Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Social Networks Software The Internet

Crowd-Source Translation Software For Free Content? 62

yahyamf writes "I have a lot of free educational content in the form of audio lectures and text, which I'd like to translate into as many languages as possible. I would also want to transcribe the audio and create audiobooks from the text. There are already several volunteers willing to contribute, but I need some web based software to manage all the work. Facebook is already doing something like this, but it is only for their content. I've also looked at Damned Lies, which is part of the Gnome project, but it doesn't seem to handle audio. Are there any other open source translation projects out there that I can customize and build upon?"
This discussion has been archived. No new comments can be posted.

Crowd-Source Translation Software For Free Content?

Comments Filter:
  • Question (Score:4, Insightful)

    by arizwebfoot ( 1228544 ) * on Tuesday April 28, 2009 @12:12PM (#27746787)
    Are they your lectures and who owns the copyright on the lectures? Does the university or do you? Since your work product was for hire . . .
    • Re:Question (Score:4, Informative)

      by bcrowell ( 177657 ) on Tuesday April 28, 2009 @01:20PM (#27747685) Homepage

      Are they your lectures and who owns the copyright on the lectures? Does the university or do you? Since your work product was for hire . . .

      Hold on there, cowboy. It's not that simple. In the US, work for hire status depends on three criteria [worksmadeforhire.com], and those criteria are somewhat ambiguous as applied to university professors. Here [bc.edu] is a more detailed discussion of the law. There isn't a clear legal precedent addressing the issue, but that's because the issue almost never comes up. The issue doesn't come up because there's a solid consensus in the world of education that the professor owns the copyright to things like lectures, textbooks, and journal articles. (Note that when it comes to articles, a journal that requires a copyright transfer asks the author, not the school, to sign it.) Regardless of the law, it's clear that there are overwhelmingly strong reasons (e.g., academic freedom) why universities know they shouldn't cross this line. It's sort of like Mia Farrow's famous remark that "you don't fuck the kids." Doesn't matter if it's theoretically legal to go there, you just don't go there.

      More relevant questions to ask the OP would be (1) where we can take a look at these materials, and (2) whether he's put them under a free license such as CC-BY-SA. If the answer to #2 is no, then probably nobody will be interested in doing the translations for free.

      In answer to the OP's original question, I know of two approaches that could be used. One would be to create a wiki of the English version and then allow translators to use the wiki to produce translations. Another would be to put the English version in some kind of format that's amenable to version control (e.g., plain text or latex), and use version control software such as git.

      I have some experience with this because I wrote some CC-BY-SA-licensed physics textbooks, and over the years I've been contacted by roughly 10 people who were enthusiastic about translating them. None of those people ever translated any significant amount of text. It's a huge amount of work to do this kind of translation, and people's enthusiasm seems to evaporate quickly. A good example of the fragility of enthusiasm, in a slightly different context, is wikibooks, which is basically an abysmal failure, at least if you compare what it's accomplished over all the years of existence with its original stated goals, which were to revolutionize education. Writing or translating a book is just too much work for most people to tackle without some kind of financial or nonfinancial reward. It's not analogous to software, which is a functional product rather than a creative one.

  • Oh no (Score:4, Funny)

    by Dyinobal ( 1427207 ) on Tuesday April 28, 2009 @12:15PM (#27746853)
    I've fallen behind in my web 2.0 buzz words. What the hell's a crowd source? I was thinking someone or something that draws crowds like Obama or double jointed Swedish twins. Unenlightened minds want to know!
    • Re:Oh no (Score:4, Funny)

      by EdZ ( 755139 ) on Tuesday April 28, 2009 @12:22PM (#27746941)
      It means "post your work on the internet and get lots of other people do do it".
    • by Aladrin ( 926209 )

      Outsource means to get someone 'out'side your company to do your work, usually for money.
      Crowdsource means to get a 'crowd' of random people to do your work, usually for free.

    • Re:Oh no (Score:4, Funny)

      by MBCook ( 132727 ) <foobarsoft@foobarsoft.com> on Tuesday April 28, 2009 @12:40PM (#27747181) Homepage

      A crowd source is where crowds come from, like an apartment building.

      A crowd sink is where the crowds go to, like a stadium event.

      Standard EE terminology. So the poster must be looking for... a place to steal people from to force them to translate boring lectures? I'm not sure how electrical engineering applies here.

      Maybe electroshocks as "encouragement" to do the work?

      • Re: (Score:3, Insightful)

        by mr_mischief ( 456295 )

        You silly, this is where that "social engineering" comes into play.

        You convince all these people they want to help you for free, then you sell the fruits of their labor for money.

        Some see crowd sourcing as communist, but it's actually quite the opposite. It's capitalism in hyperdrive: you put up a little bit of capital, organize a whole bunch of ultra-cheap willing (so technically non-slave) labor, and you profit more from the higher margin. Those paid laborers can just keep working for those stupid enough

      • A crowd source is where crowds come from, like an apartment building.

        Even the stork theory is better than this... Do you tell that your children too?

      • Re: (Score:3, Funny)

        Wrong. A maternity hospital is a crowd source. A cemetery is a crowd sink. And apartment buildings connected to stadium events form oscillators.
      • by CityZen ( 464761 )

        An apartment building (and a stadium) is a source or a sink, depending upon its current state and the surrounding conditions.

        So if the building is currently charged up and the proper conditions arrive outside of it (morning, nice weather), then the building becomes a source.

        Similarly, when the building is nearly empty, other conditions (evening, bad weather) might turn things around and it becomes a sink again.

        You can think of these things as capacitors.

        Also, the interconnections between buildings (sidewalk

    • What the hell's a crowd source?

      I think it might be a typo. Is crowd sauce what you might put on your crowd burger?

  • by Anonymous Coward

    Hello,

    At transposh we aim to create such a project, that will enable crowd-sourcing websites translations (and hence your scripts), no audio is planned though.

    Currently we have a wordpress plugin, but a generic plug is being written, everyone is welcomed to help

    Ofer

  • Here's what I think is the best way to facilitate "crowdsourced" translation: write a "semi-automatic" translator. That is, you have to spoonfeed it information about the grammatical function and meaning of all of the text, which signficantly simplifies the problem of automatically translating it. Then, you can turn over any text to crowdsourced translation. Instead of having to know two languages, all that the crowd has to know is what the text actually means, which then allows them to disambiguate it f

    • Launchpad's Rosetta [launchpad.net] tool does something different, but workable. Software usually doesn't have complete sentences as much as words or phrases sprinkled about the UI. So what they do is gather up a bunch of pre-existing translation strings and suggest them to a human for approval for a new context.

      What you're proposing is basically an intermediate language [springerlink.com], similar to how you can compile Java, Ruby, Python, Scala and so on into JVM bytecode, and let the JVM translate into the platform's specific language. So

      • So when you say "Instead of having to know two languages, all that the crowd has to know is what the text actually means, which then allows them to disambiguate it for the program," what you really mean is "Instead of having to know two language, all that crowd has to know is some intermediate representation language and the target language."

        Whoa, not so fast! Under my idea, you don't need to know an intermediate representation language. (And you don't need to know the target language, just the source language, but I assume that's what you meant.) You just need to be able to disambiguate potential meanings to arbitrary precision, which you can certainly do if you understand it. Certainly, you have to learn how to "tag this as the verb", "pick which meaning of 'set' is being used here", "identify the antecedent of 'him' in this sentence", et

        • by SEWilco ( 27983 )
          Yes, you do need to know an intermediate language. It's the language which you use to identify the exact meaning of the original. It's called transfer-based machine translation [wikipedia.org]. The computer will have rules to emit facts, concepts, and actions into various languages, but you have to define the components so it understands them. You'll need to know exactly how to define that meaning to "arbitrary precision", and to do that you have to use a language (whether represented as text or graph images). You'll
          • I guess my idea still isn't clear. The link you gave still describes, as best I can tell, fully-automatic translation which just happens to use an intermediate language to simplify the process. The article describes how the program has to infer, based on statistics and context, what each word means. That's the complete opposite of the "semi-automatic" translation I'm going for, which attempts to elimiate the need to guess by having someone who understands the source text in the source language assist the

      • Okay, maybe some clarification is in order. My idea wasn't necessarily a replacement for what the submission was trying to do. It just tackles the general problem of "how to translate a lot of stuff". The idea is the program -- which has to have a special module for each target language -- PLUS crowdsourcing the easier work of marking up the source text. So you are correct that writing the program requires knowledge of the target language, and some kind of internal representation format. However, using

      • "Lingua franca" means "language of the Franks", not "french[sic] language".

        Just to confuse the issue further, the Franks were in fact Germans.

  • This doesn't handle audio, nor does it seem to be up even, but this seems kind of like what you want:
    http://blogoscoped.com/archive/2008-08-04-n48.html [blogoscoped.com]

  • A Helpful Comment (Score:3, Informative)

    by WyerByter ( 727074 ) on Tuesday April 28, 2009 @12:45PM (#27747233) Journal
    The people over at BOINC [berkeley.edu] have a software called Bossa [berkeley.edu] for distributed thinking projects (crowd sourcing). I am not sure of the current status of the project, but I have heard of at least one group that is trying to implement it.
  • soo...you need software to manage work being done by a large number of people?
    Any bug tracker software will do the job.(bugzilla, tracker, etc.)

    Create a bunch of bugs for the things you need done and assign them to people, people can discuss them, upload solutions and discuss those solutions, upload patches for issues, post new bugs for new required translations etc.

    No need to create new software for something this simple and generic.

    - Jesse McNelis

  • consider using http://99translations.com/ - they have a good interface, several OSS efforts use them for internationalization and I'm pretty sure they have a "free" option. YMMV

    • Re: (Score:1, Funny)

      by Anonymous Coward
      The welcome email from 99translations:

      Welcome to 99translation.com!

      Our tools deliver the best expirience to software companies to find right translatior for their software and the integrate their work in their release cycle flawlessly.

      If you have any questions feel free to ask them via e-mail to support@99translations.com or post to our forum at http://99translations.com/forums/4 [99translations.com].

      Our integration tools can be achieved at http://99translations.com/tools [99translations.com].

      Feel free to post your translation requests to t

  • I'm an freelance translator and I'd like to warn you about the most serious pitfall of crowdsourcing - the quality. I've seen Facebook translation onto my language (Polish) and it's terrible. There are other projects done this way and most of them are of extremely poor quality.
    Problem is - if you want quality content, you need professionals do the job. They don't necessarily have to be paid professionals (translators) - maybe just the people from your field, who wish to contribute for some reason or other.

    • Re: (Score:1, Troll)

      by gdek ( 202709 )

      Spoken like a professional translator.

      • Spoken like a professional translator.

        A subject poorly taught is poorly learned. You need clarity. You need consistently. A sense of style, a touch of humor.

        • You need clarity. You need consistently.

          And, it seems, you need to know the difference between a noun and an adverb.

      • Spoken like a professional translator.

        Indeed. I hope you don't mean that in a pejorative sense? When TFQ is asking about translation, it's perfectly appropriate for professionals in the field to chime in with their insights and expertise.

        There was an article recently in the Japan Times about a project at the University of Tokyo to build a very similar system, though it is apparently just for texts being translated into Japanese. For the curious: http://search.japantimes.co.jp/cgi-bin/ek20090422a1.html [japantimes.co.jp].

    • by pjt33 ( 739471 )

      Facebook has a voting system for quality control, so it clearly demonstrates that even that isn't sufficient to get a halfway decent translation. I use Facebook in Spanish and the translations are often appalling.

      • I think this is a consequence of self-selection for the translation job. The ones who are really fluent in English may feel less of a need for a Spanish translation.

        For what it's worth, the Norwegian Facebook translation has become quite good with time.

        • by pjt33 ( 739471 )

          I wasn't very clear in GPP. The main problem isn't so much the sense of the translation as the quality of the Spanish. The spelling is terrible and the verb tenses appear to be random.

  • "I have a lot of free educational content in the form of audio lectures and text"

    Are you by any chance an Amway salesman trying to get attention?
  • by dwheeler ( 321049 ) on Tuesday April 28, 2009 @01:49PM (#27748041) Homepage Journal
    It handles texts, not audio, but Open Source Mission [opensourcemission.com]'s Gospel Translations [gospeltranslations.org] might be a useful model. They work with publishers/rights-holders (if any) to get the right to post works, then coordinate translations to a huge variety of languages. Once a translation is done, they post/host it for free. The translations are developed using a Wiki. Their focus is on Christian works, but I think the approach would work for any literature you want widely distributed in a variety of languages.
    • The translations are developed using a Wiki. Their focus is on Christian works, but I think the approach would work for any literature you want widely distributed in a variety of languages.

      Such as lolcat-speek? From the LOLCat Bible, Genesis 1:

      At start, no has lyte. An Ceiling Cat sayz, i can haz lite?

  • LIBRIVOX.ORG (Score:1, Informative)

    by Anonymous Coward

    You may want to contact the folks at Librivox.org -- they're currently making audio books of the Project Gutenberg content and they have a system in place for handling the audio files, quality control --- it sounds very much like what you're looking to do. Perhaps they'd either let you use them to host projects or at least could give you pointers on how their software/processes work so that you could create something similar without completely reinventing the wheel.

  • One option worth looking at is Transifex [transifex.org]. It's being very actively worked on, with the release of the next version [transifex.org] imminent. Also, a hosted version [transifex.net] is planned, so you eventually won't even need to maintain a server to run it on. It works with the big five FOSS VCSes, and the new version will be able to crack open tarballs as well. The Fedora Project has been using it for about a couple of years now, with great success [lwn.net].
  • www.icanlocalize.com has some interesting offerings in the way of workflow & price, especially if you are already using a CMS like Drupal 6.

    If you use Drupal for example, you can set it up so as soon as you 'publish' (or at least advance the content in the workflow-process) it is made available for professional translators to begin working on.

    The pro translators have their own web-enabled interface and toolset (I think) which is similar to www.trados.com in translation memory function.

    If you are a non-p

  • See also http://www.meedan.net/ [meedan.net]

    Also, Google has a translation widget that might be a reasonable stop-gap measure.

    http://translate.google.com/translate_tools?hl=en [google.com]

  • For speech-to-text, an obvious place to start is with the long-aged ViaVoice [ibm.com] engine. If you can figure out how to buy it, as the page for that info is empty or broken.
  • You're probably looking for something like Pootle [sourceforge.net]. This is used by a number of projects doing localisation including: Creative Commons [creativecommons.org], OpenOffice.org [sunvirtuallab.com] and others [sourceforge.net]

    It allows online translation and management of translation projects. It translates Gettext PO (for software localisation) and XLIFF (XML Localisation Interchange File Format), by using standard localisation formats it makes it easy to manage both online and offline translations. The Translate Toolkit [sourceforge.net] can be used to convert various formats [sourceforge.net] into P

Remember to say hello to your bank teller.

Working...