A Conversation for Ogg Vorbis - Better than Mp3

Writing Workshop: A2386262 - Ogg Vorbis

Post 1

MedO

Entry: Ogg Vorbis - A2386262
Author: MedO - U222587

Any comments, additions and changes are, of course, welcome. Please take a look.


A2386262 - Ogg Vorbis

Post 2

Ferrettbadger. The Renegade Master

smiley - cool


A2386262 - Ogg Vorbis

Post 3

Dr. Memory

How timely - I was going to suggest an article on audio coding.

'codec' = Encoder / Decoder

You should add a link to the edited guide entry on MPEG 3, which, by the way, could use some work.

I've been delving into the Vorbis spec, but haven't figured it all out yet. Here are some things which may be useful.

MPEG III (mp3) is a refinement of MPEG 'layers' 1 and 2 (I and II) - MPEG 2 is widely used for digital video distribution. In all MPEG layers, the audio is broken down into 32 different frequencies, and encodes each sub-band frequency sample into a binary number but using a different number of bits based on the human ear's ability to discern noise within that frequency band. Layers 1 and 2 differ in that layer 2 quantizes (binary codes) three frames of samples as opposed to a single frame of 12 sub-band samples for 32 bands, and uses a 'denser' coding methodology. MPEG III further divides each of the 32 sub-bands into additional frequency components using the "modified discrete cosine transform" (MDCT), a form of Fourier (frequency) transform. MPEG III also adds in nonuniform quantizing, which is a form of gain compression, to the MDCT output, and uses Huffman coding ( a lossless binary compression method) to further compress the sub-band data.

MPEG III also required the introduction of a bit reservoir to accommodate the variable length of the compressed data which resulted from the more sophisticated quantization approach used.

Vorbis dispenses with the 32 sub-bands used in MPEG and goes right to the MDCT as the method for spectral transformation. A codebook is used to encode the transformed frame data, and this process also uses Huffman compression.

One of the big differences, and apparently big shortcomings, of Vorbis, is that critical data used in decoding the audio is contained in a large setup header of several kilobytes. This makes it difficult to 'tune in' to a stream of Vorbis frames and sucessfully decode them. A second issue that developers have come across is that the inherent variable length / frame bit stream requires more memory (along with the memory needed for the initialization data in the setup header) which makes it more expensive to build a chip to implement the decoder in hardware. The Vorbis spec also doesn't restrict Vorbis encoders from making files which are beyond the ability of a particular hardware-based decoder to decode. These limitations are however becoming less of a technical issue and adoption now is limited by market forces.


A2386262 - Ogg Vorbis

Post 4

Dr. Memory

forgot a couple things -

The wikipedia entry on Vorbis is also a good reference -

http://en.wikipedia.org/wiki/Vorbis

The MPEG specs are publicly available as an ISO standard. However, the developers of MP3 (MPEG 1 layer III) have patents on the algorithms used, and these patents are being enforced. Vorbis uses well-known technology, but this is no guarantee that it doesn't infringe on someone's patent somewhere in the world. This is something that the patent courts will have to resolve.

Might also want to mention MPEG-4 which will(has) introduced a new audio codec (a variable bit rate coder - suprise!) with smaller file sizes than mp3.

Overall, a good entry.


A2386262 - Ogg Vorbis

Post 5

Dr. Memory

still more -

an article in 'ISO Bulletin', May 2002, points out that MPEG-2, although a published ISO standard, involves about 100 patents. This article states that "it is a fact that it is virtually impossible today to develop an audio or video coding standard with reasonable performance that does not infringe on one, or more likely several, patents". For MPEG-4 this situation has been resolved by the formation of a patent pool. The patent holders agree to pool their licensing intrests with a third party which enforces the patents and collects royalties.

For AAC, the audio codec in MPEG-4, this is done through a company called via technologies, and here is their license fee chart:

http://www.vialicensing.com/products/mpeg4aac/license.terms.html


A2386262 - Ogg Vorbis

Post 6

MedO

Well, maybe all this information about the MPEG Audio streams should go into an article of its own (and that will only be referenced by this one), because it doesn't strictly have much to do with Vorbis.
I will include the rest of the information soon, thank you. smiley - cheerup

MedO


A2386262 - Ogg Vorbis

Post 7

MedO

I'm sorry I didn't update it yet, but I have really very little time, since I have to finish important things for school and things like that. Just so you know I didn't go Elvis, even though there still won't be an update for a bit more than a week smiley - erm. I hope I can find the time then.

MedO.


A2386262 - Ogg Vorbis

Post 8

MedO

I finally found the time for this update... just read it and tell me how crappy you think it is.smiley - winkeye I apologise for the long delay.

MedO


A2386262 - Ogg Vorbis

Post 9

Dr. Memory

I think it's coming along nicely. I also have been busy with real life.

For the for real nerds part, I have a suggested rewrite:

'The best way to understand the operation of the vorbis codec is to examine the decoder process. The decoder starts by reading initialization values from the start of the Vorbis bitstream. This initialization also contains a Huffman codebook (see A533170) used for compression of frequency-domain data. Initialization values are followed by blocks, or frames, of coded audio data. This data comprises a "floor" spectrum which contains the overall base spectral characteristic of the block of data, and "residue" terms which represent the fine detail of the audio spectrum. These are then multiplied to produce the composite spectrum for that block. The spectrum is then translated into time samples using the MDCT, or Modified Discrete Cosine Transform. Symmetry in this transform allows for straightforward merging of this data with decoded samples from the previous frame, and these merged samples form the uncompressed output of the codec.'

In hindsight, the patent issue is a much broader issue with open source software than with just this audio codec. A look at the list of patents referenced in the Adobe Acrobat 6 splash screen shows how bad the situation has become, at least in the U.S.


A2386262 - Ogg Vorbis

Post 10

MedO

Do you know why the vorbis blocks may overlap in time domain? Something like smooth transition between blocks?

MedO


A2386262 - Ogg Vorbis

Post 11

Dr. Memory

I think this is in reference to the decoder. The overlapping when going from the frequency domain to the time domain is because, to accurately represent a fixed time duration waveform, we would need an infinitely long frequency (spectral) representation. Suppose we have a single 'spike' (a single tone) in our frequency data. When we translate that back into the time domain, we have an infinitely long sine wave, which extends both before and after the sample time of our original data, which is clearly wrong. However, if we start out with a 'chopped' or "windowed" sample of a sine wave, that is, one that does not go on forever, but starts at the start of the sampling window and ends at the end, when we turn that into the frequency domain we get a bunch of extra, higher frequencies. This was discovered in the 18th century and is called the Gibbs phenomenon. These higher frequencies represent information about the start-up and shut-off of the signal at the beginning and ending of the sampling period. Conversely, if we take this infinitely wide spectral data and truncate it to a fixed number of frequencies, and then translate back into the time domain, we get extra, erroneous signal amplitude before the start and after the end of the original sampling time. So I think this effect is what you are referring to.

My recollection on the DCT is that it turns out that, when using the 'modified discrete cosine transform' (MDCT), when the frequency data is returned to time-domain data by the inverse MDCT, adding the time output from one block to the next cancels the errors which would have arisen from inverse transformation of the frequency data itself. It's sort of like the original time to frequency conversion in the coder generates errors in the frequency data for each block that can get eliminated in the decode frequency-to-time converter because the blocks of data are all ordered.

This is why the MDCT is used in most all sophisticated audio compression schemes. In image processing and video, this ordering doesn't exist, so the 2 dimensional Discrete Cosine Transformation is used instead.


A2386262 - Ogg Vorbis

Post 12

Geggs

A game that may sister's partnet has recently written makes use of Ogg Vorbis, so I'm quite interested in this. And this is quite a good entry.

Not that I wish to jump the gun at all, but I think this entry could do with a spin in PeerReview. I've certainly seen worse in PR.


Geggs


A2386262 - Ogg Vorbis

Post 13

MedO

First, sorry that you had to wait so long for my answer. I have been to Croatia for the past two weeks and didn't have the opportunity to conect to the net there.
Well, thank you for your comment, I think I will give it a try. The article will be improved a bit before submitting, expect the updated version this week.

My best wishes for your marriage (or whatever you say in England, things like that are always difficult for me, being German).


Key: Complain about this post

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more