Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
News

Open Standard For Recording Compressed Voice? 12

john napiorkowski asks: "I do a lot of voice recording and have been using realaudio, which distributes 'free beer' tools for the job. However, I am greatly concerned by this reliance on such a proprietary tool; is there an open standard, free software replacement available? I have tried MP3, but it doesn't sound remotely as good as realaudio for voice recording at very high compression levels. Fifteen minutes of voice compressed with realaudio is under a meg and sounds almost exactly like the original, while MP3 sounds very poor to get the file that size."
This discussion has been archived. No new comments can be posted.

Open Standard for Recording Compressed Voice?

Comments Filter:
  • by Anonymous Coward
    it was expressly designed for just this task, in fact; some kind geeks have been extending the format so it's seekable under WinAmp and XMMS. Its primary use these days is lossless audio compression, but it _does_ support a large variety of lossy modes, including some expressly designed for speech.

    Go here [etree.org] for source/rpms/debs...
  • Yeah. For one thing, all of the information in speech is encoded within a 4 kHz bandwidth, while music is a 22 kHz bandwidth. In addition, there are many differences between the nature of voice signals and music signals.

    If you're ONLY dealing with speech, it's much easier to get a good compression ratio than if you have to deal with music, and I'm not just talking about the bandwidth differences.

    I wouldn't bother with Vorbis - it was designed for music, so it won't work for voice signals as well as codecs designed for voice.

    I would look at codecs like the aforementioned GSM or G.whatever (G.711 is one speech codec, can't remember the others. I'd go to http://www.openh323.org/ for some more information on speech codecs among other things.

    Note that G.whatever (and I think GSM) too, are at least somewhat encumbered by patents, but the licensing terms are relatively friendly from what I gather. And they are most definately standardized. (The only speech codec in wide use that I can think of off the top of my head is Qualcomm's PureVoice codec, used quite heavily in CDMA cell phones.)
  • Other than MP3 or RA, GSM has been designed primarily for compressing speech (afaik it's optimized for a German speaking male voice, ymmv). I record radio programs via cron job on a daily basis, and at a sample frequency of 11025 Hz (8 MB/hour) - as opposed to the standard 8 kHz - there is practically no audible difference between the (mono) FM broadcast and the GSM replay. Moreover the compression doesn't use too much CPU so even some old 486 should do it in realtime.
    If you looking for a lightweight command-line tool for GSM compression, check out the GSM Tools [tuwien.ac.at] from my homepage (thx to Jutta Degener and Carsten Bormann for their GSM library).
  • If you want something that is easy to implement, try a continuously variable slope delta modulation (CVSDM) encoder/decoder. You can get communications quality voice at 32 kilobit/second. Not as good as the more sophisticated systems used in PCS and secure telephones, but very easy to implement and it doesn't need a fast CPU. It is used on the Space Shuttle's air-to-ground communication links.
  • by The Iconoclast ( 24795 ) on Tuesday October 03, 2000 @11:09AM (#736603)
    Check out LAME (www.sulaco.org/lame). it does mp3 encoding, but has special otions for encoding voice (bandpass filters, single channel, low bitrate, etc.). It is under the GPL and is finally patent free! It also compiles under (nearly) every computer system known to man.
  • You might want to check out Ogg Vorbis [vorbis.com]

    I'd definately recommend this, it can be played on the open source, player Freeamp [freeamp.org] - which runs on Solaris, Linux, BSD, and Windows.


    Steve
    ---
  • Fifteen minutes of voice compressed with realaudio is under a meg and sounds almost exactly like the original, while MP3 sounds very poor to get the file that size.

    If you want poor quality, you could use a phoneme-based compression and compress about 15 hours in about a meg, assuming 2-byte phonemes, 10 phonemes per second. That would not be "voice compression" but "speech compression", though.
  • I would look first at GSM coding. I've heard some samples and for voice it does a quite good job. Not so great for non-vocal recordings. Depending on what you want, it might or might not be suitable for your application. Source is available from various sources on the net. Try "GSM Source" or "GSM CODEC source" searches on Google or your favorite search engine.

    While I was out looking for a GSM source, I came across this page [tml.hut.fi] which has a table of some of the different options, better than I could have put it. They also have sound bites in each format, however, they are in the compressed format so you'll need a decoder for each format to listen.

    You may want to check out Ogg Vorbis [vorbis.com], which is an alternative patent-free opensource audio compression. I haven't heard any low bitrate samples and the implementation is rather new, so I really can't vouch for this.

  • If you want to experiment with the GSM codec, try the "Windows Sound Recorder" that is included as a standard accessory program in Windows 95/98/NT distributions. By default it saves WAV files, but it can be persuaded to use a different encoder. GSM 6.10 is one of the encoders that is supported. I have a 6.5 minute GSM recording that is only 650 kbytes in size.

    The size/quality trade-off for MP3's varies widely with different encoders. I use a CD-ripper program called "CD Copy". By default this ripper uses the "Blade" encoder, which is free but sounds pretty bad unless the bit rate is as high as 128 kbps. If you plug in a different encoder, such as the "Lame" encoder, you can get much higher quality sound with lower bit rates. After I got the LAME_ENC.DLL and plugged it into CD Copy, I started encoding music at 64 kbps!! And it still sounds fairly good (subjective, sure). If you would like to give this encoder a try, and you need help setting up CD Copy for this type of WAV-to-MP3 conversion, drop me a line.

  • CELP certainly compress voice nicely but it requires a lot of processing power. There is A(rithmetic)CELP which is developed by professors and research folks at the university of sherbrooke where I study. From what I understood in a speech from one guy working on that, it's like CELP on steroid, it requires much less power. So today when people talk about CELP, they mean ACELP. That's what cell phones uses. It's not open dude cuz a lot of work has been put on that. It can't be free. It has been exclusively licensed to Siprolab which let them work exclusively on tech stuff. The university sure gets some royalties off that work. Like the professor said, they just can't give away all that effort, it took them years to get where they are now. So you must not be surprised if these high-tech compression algorithms are not public... Otherwise, how would they make money from them ?
  • It's not just about bit-rate though is it?

    Doesn't speech compression only include sounds that the average voice makes (and not for example the wave amplitude of an electric guitar played by those metallica wankers - hurrah!) thus having a restricted range of sound with better compression?

    Or something.

    Anyone?

  • The United States government has created several freely available standards for voice compression. I would look into CELP (Code-page Excited Linear Predictive coding). It is a variation of the LPC coding system that has the same bit rates, but sounds MUCH better. The algorithm was invented by the NSA and given to the public. I believe this is the algorithm used in the STU-III (never actually seen one, but someone told this to me on sci.crypt). There are two standard bit rates, 9600 bits/sec, and 2400 bits/sec. I understand that 9600 sounds good, but 2400 is rather robotic sounding.

    I would also get yourself to a good University library and find articles and books on voice coding. About a year ago I saw some articles on very low rate bit coding for voice. Again it was government/military sponsored work. I remember reading about an experimental coding system that operated in the 40bit/sec range (yes I find that hard to believe too). It should not be hard to find some good material.

    There is a small book consisting of nothing but reprints of some of the most important papers in the field of Speech Analysis/Synthesis/Coding. Unfortunatly I don't have the title, but it does have a copy of "Speech Analysis and Synthesis by Linear Prediction of the Speech Wave" by B.S Atal and Suzanne L. Hanauer (Journal of the Acoustical Society of America, volume 50 Number 2) which describes the LPC process.

I tell them to turn to the study of mathematics, for it is only there that they might escape the lusts of the flesh. -- Thomas Mann, "The Magic Mountain"

Working...