Converting WAV to MP3 and back

This article covers a brief introduction to the windows Audio Compression Manager API.

This is a copy of an article I wrote for the Delphi Developer Newsletter

The components I have written for this article are part of an open-source project and are available at my homepage.

Audio Compression Manager
Many years ago, before I really knew what the internet was, I heard a rumour of an "Internet Telephone". This piece of software could allegedly transmit speech between two people on the internet in real time, allowing World-wide telephone calls for the price of a local call. With Lyn's brother living in the U.S.A. and us living in the U.K. phone calls were few and far between, and always far too expensive. So, you can imagine how exciting this rumoured technology was to us. So exciting in fact, that we got ourselves connected.

To cut a long story short, we tried this telephone software and it was awful ! So bad in fact that we stuck with our normal telephone and ridiculous phone charges.

GOOD NEWS ! That was a long time ago, and since then things have come a long way (and phone calls are also much cheaper). With the ever increasing popularity of the internet, media has become higher quality yet smaller in size.

There are now numerous streaming audio formats around, and even streaming video, and all this has been made accessible to people on very low bandwidths. That's not all, not only have these formats (Quicktime, RealAudio, and even MP3) become more popular, they have also become more accessible to the developer.

Codecs
Some of these compression routines have been made accessible through the introduction of "codecs". A number of codecs are installed as standard on Windows installation (thanks Microsoft !)

GSM - I believe this format is used by some mobile phone networks

DSP TrueSpeech - I have heard a demonstration of this 1 bit audio format, very clear !

Fraunhofer IIS MP3 - This is most certainly my favourite of them all, it allows you to make your own MP3's

PCM - The standard used by windows, most codecs can convert to/from PCM

Note :
A full list is obtainable by in the MultiMedia section of your control panel. Double click "MultiMedia", then click the "devices" tab, and then expand audio compression

So, what is the point of a codec ? Well, a codec is a little bit like an ActiveX component. ActiveX components allowed developers to implement functionality within their applications without having to write all of the code involved (eg, embedding a Word document). Codecs do the same sort of thing but concentrate on converting media formats into other media formats. For example, if you wanted to write an application which took audio data from an audio CD and then converted it into an MP3, the only work you would need to do yourself would be

  1. Extract the audio data from the track
  2. Write a valid MP3 header to your hard disk
  3. Instruct the relevant codec to encode the audio data as MP3

ACM and API
Firstly, I will mention that ACM stands for "Audio Compression Manager". This is the library written by Microsoft as part of Windows which acts as the programmer's interface to the codecs.

The ACM really belongs in MMSystem.pas (which handles Windows multimedia) but, for some reason, it has been omitted. The first task therefore is to find a copy of MSACM.pas, which is an API conversion of this API. The one I found most useful was the conversion by Francois Piette, which was posted on Project Jedi (www.Delphi-Jedi.org).

ACM requires the developer to undertake the following steps in order to convert media between formats

  1. You must decide on your Input and Output format. This is based on the TWaveFormatEX record but, be warned, this record structure is not actually large enough to store the required information of most codecs.

    It is due to this that I used my own TACMFormat record, which is no more than a TWaveFormatEX record with a few extra bytes tagged on at the end. You really have no way (that I know of) of finding out what
    these extra bytes mean, or how they should be set. My solution to this problem was to use the acmFormatChoose to allow the developer to choose the formats at design time, and then have these values streamed within the IDE as a custom property (more on this later).
  2. You must then open an ACM stream. This is done by calling cmStreamOpen, passing the Input and Output formats so that the ACM is aware of what is required of it. At this point the ACM will either return a valid handle to a stream, or will return an error code such as ACMERR_NotPossible to indicate that the conversion requested cannot be performed (more on this later).
  3. The next item to be performed is to determine the size of the output buffer. Calling acmStreamSize will inform the ACM of how many bytes you intend to supply it with each time, it then will return the
    required size of the output buffer (this will nearly always be over estimated to ensure that you supply a large enough buffer).
  4. The final preparation step is to prepare a header. All we need to do here is to call acmStreamPrepareHeader passing the stream handle we received from acmStreamOpen. The header that we prepare will also tell the ACM the address of our "source" buffer, and the address of our "destination" buffer (The ACM does not allocate this memory for us, we need to allocate it ourselves).

At this point, all of our preparation work is done. All we need to do now is actually request that our data is converted. Since all of our preparation is complete, this is actually a very simple step. It is achieved by calling acmStreamConvert. This routine requires us to supply the Stream Handle (So that it knows which formats we are working with), and our Header Handle (so that it knows the size, location of the source and destination buffers). This routine will specify the actual number of bytes used in the conversion by
setting cbDstLengthUsed within the header. Your ACM session is now ready for another chunk of data !

Once you have finished with your ACM session it is time to free the resources we have used. This is the simplest part of all. The header is released using acmStreamUnprepareHeader, and the stream is released using acmStreamClose.

Choosing a format
Before undertaking any of the above steps, we must have our input and output formats ready. As I said earlier, these formats are based on TwaveFormatEX (declared in MMSystem.pas), which is just a record specifying bit rates, frequency etc. etc. Unless you only intend to convert between different PCM formats (which is really quite unlikely), this format is just not going to be sufficient. The following format was used instead throughout the component code attached.

TACMWaveFormat = packed record
  case integer of
    0 : (Format : TWaveFormatEx);
    1 : (RawData : Array[0..128] of byte);
  end;

The idea here is that we can still access the TWaveFormatEX data by referring to the Format part of the record, yet RawData supplies us with enough room for any additional data required for any of the individual codecs.

Although we do not know either the size of this additional data, or what it represents, it is still a simple task to acquire it. This is done using acmFormatChoose.

AcmFormatChoose requires only one parameter of type TACMFormatChooseA. This parameter is a simple structure holding the following (relevant) information.

pwfx
A pointer to a TWaveFormatEX structure to receive the result (we actually pass TACMFormat)

cbwfx
The size in bytes of the buffer which is to receive the result.

cbStruct
The size of this structure

Note:
Another item worth mentioning is fdwStyle, which holds flags specifying additional options for the format selection dialog box. In particular is the (very long) flag ACMFORMATCHOOSE_STYLEF_INITTOWFXSTRUCT, which informs the dialog box that the buffer pointed to by pwfx already contains a valid format, which should be displayed as the default when the dailog appears).

A bit of Chinese whisper
Earlier I mentioned the ACMERR_NotPossible error. This may occur for a couple of reasons.

The first two to note (simply because they are the easiest to explain) are

  1. The codec you specified on your machine, may not be available on a client machine
  2. Although you can convert from a specified audio format, you cannot convert to it. This is the case with the Fraunhofer IIS MP3. While Windows 9x and Windows NT users can create MP3 files to their heart's content, for some reason the ability was removed in Windows 2000 (Yes, thanks again Microsoft). Although Win2K users may listen to the results, they are not allowed to generate them unless they pay someone first !

The final one is a little more complicated, and warrants the phrase
"Chinese Whisper".

Not all ACM formats are interchangeable, for example (I am just making these up, so if they actually work don't write saying that I was wrong) you may not be able to convert

GSM 8BIT MONO > MP3 8BIT MONO

You need to find a "middle man". This is quite often a PCM format, as most (if not all) codecs were designed to convert PCM into a more suitable format.

The above example would therefore be achieved like so.

GSM 8BIT MONO > PCM 8BIT MONO > MP3 8BIT MONO

Converting to "MP3 16BIT STEREO" would probably require yet another step (between the PCM and MP3, in order to convert 8 bit PCM to 16 bit PCM).

I think you will now understand why this section is called "Chinese Whisper". (If anyone can tell me the meaning of that phrase I would appreciate it !)

The hidden talents of ACM
You may or may not yet be convinced that the ACM is a good thing. This seems like quite a lot of work just to convert one media format into another. Considering the alternative of writing your own audio format small enough for internet streaming, or writing your own MP3 compression routine, ACM is what we British call "a doddle".

Imagine a simple application which takes input from your microphone, compresses it to a suitable format for streaming over a very low bandwidth, and then transmits it to a destination PC over TCP/IP. While at the same time receives compressed data, decompresses it, and then plays it out of your speaker (aka, a simple internet phone).

Did I say simple ? Well, actually, yes !

This sounds like a lot of work, and probably is (except with the components supplied it is actually very simple).

This is where the hidden talents of the ACM come into play. Quite a few of the ACM codecs are Wave-mappable. Which basically means that they may be treated as a standard WAVE device when playing / recording audio.

For example. It is quite easy to open an input for a GSM sound source. Once you receive a buffer of data from the wave input device it is already compressed and ready for transmitting. On the other hand, as soon as data is received through your TCP/IP socket, it is possible to play this data directly through a wave out device.

  1. Data in from MIC
  2. Send data through socket
  3. Data in from socket
  4. Send data to wave-out device

The standard PCM data would be far too large for real-time streaming over a modem. Whereas GSM 6.1 can be transmitted as low as 1.5k/second, and MP3 16BIT MONO can be streamed at a mere 2k/second.

Apart from Win2K not being able (or even allowed) to create MP3, there is something else worth mentioning about this format. Although it can quite easily be treated as a wave-out device, it did not seem to work as a wave-in device. Which is why I found it necessary to convert the PCM data to MP3 manually (which turned out to make quite a nice demonstration project)

Components, demos and source code
Well, it all fine and dandy talking about this stuff, but this isn't really much use without some hard evidence to back it up.

For this reason I have included three components, and two demonstrations (demos were compiled in Delphi 5). These components are available for download on my Delphi site.

Components

  1. TACMConvertor : This really serves two purposes. Firstly, it converts data between 2 different media formats. Secondly, even if you do not intend to manually convert the raw data, this component comes in useful for specifying input/output formats of ACM streams. (The right-click component editor allows you to select the formats from the acmFormatChoose dialog at designtime)

  • TACMIn : This component is used for receiving data from your microphone. You can specify a standard PCM format, or you can specify any format capable of being mapped through the WaveIn device.
  • TACMOut : This component is used replaying audio through your audio output. Again, you may select to output in PCM format, or any other format capable of being mapped through your WaveOut device. The NumBuffers property specifies how many buffers you want filled before you start to play. This is not much use when you want instant audio (internet telephones) but can come in useful when you want to do audio broadcasting over the internet, and want to buffer some extra audio just in case your connection speed fluctuates.
  • Demos
    The first demo is really quite simple. The TACMConvertor is used only to specify the input and output formats. This demo opens an ACMIn and an ACMOut at the same time. Audio in is piped almost immediately back out, but with a slight delay, making you sound a little like Elvis Prestley (Although I am not an Elvis fan, "All shook up" was the first song that sprang to mind when I tested it)

    The second demo is a little more complicated and comes in two parts.

    The first part (Demo2.dpr) acts as a server. It has a server socket listening on port 6565 for new connections. At the same time it takes audio in from the MIC, converts it into MP3 16BIT 8Khz MONO (2k/second) and pipes it out to every connected client.

    The second part (Demo2Client.dpr) acts as a client. The first edit box requires the IP address of the server, whereas the second (SpinEdit) input is the number of additional buffers that you require. Once you click connect (and the requested number of buffers has been filled) you will start to hear the audio from the server. MP3 16BIT 8Khz MONO is surprisingly good quality, and also surprisingly low bandwidth.

    Well, that just about completes this article. I hope you have enjoyed reading about it much more than I enjoyed having to work it all out !

    As of version 2.0 of these components are now commercial and available at http://www.droopyeyes.com

     

    Share this article!

    Follow us!

    Find more helpful articles: