Putting a voice to an embedded project

Microchip Technology Australia
By
Tuesday, 01 December, 2009

Adding voice to an embedded project can enhance the user experience of a product. Commands can be confirmed, statuses can be announced and temperatures can be read aloud.

However, adding voice has been perceived as a daunting task - difficult and expensive.

This article shows you that using an 8-bit microcontroller with a pulse width modulation (PWM) peripheral can provide a low-cost and easy solution to adding voice to an embedded project.

One method of encoding speech is called adaptive differential pulse code modulation (ADPCM), a technique to digitise analog signals.

ADPCM takes advantage of the high correlation between consecutive speech samples and encodes the difference between a predicted sample and the speech sample.

When played back (decoded), future samples are predicted. ADPCM provides an efficient compression with quality speech playback.

There are various flavours of ADPCM algorithms. The Interactive Multimedia Association’s (IMA) algorithm reduces the mathematical complexity by simplifying many of the operations and using table lookups where appropriate - making it a good choice for 8-bit microcontrollers.

Since the interest here is in playback, the encoding will use a PC program and leave the decoding duties to the microcontroller.

To make playback interactive, the voice snippets are separated into individual, addressable files.

For example, to speak a numeric value for temperature, the numbers one to nine, 10 to 19, 20, 30, 40, 50, 60, 70, 80 and 90 are recorded in separate files.

So, when the temperature is 21° the voice will speak two files one after the other: twenty-one. A simple file system is used to store and retrieve the individual voice files.

The amount of memory needed to store the voice files depends on the number of bits, sample rate and the amount stored.

For toll-quality sound, the number of bits is 16 at a rate of 8000 samples per second. (This equates to a 4000 Hz bandwidth.)

Thus, the size of one second of voice is 16,000 bytes.

Once the voice file is encoded with the IMA ADPCM algorithm, the size compresses to ¼ its original size. Depending on the amount of voice needed for a project, it can be stored in the program memory of the microcontroller or an external serial flash memory.

Therefore, a one megabit (128 KB) serial flash memory can hold about 32 seconds of voice.

The flow diagram shown in Figure 1 summarises the steps taken. First, the voice is recorded on a PC as a WAV file. Second, using a sound editing program, the original voice file can be trimmed and re-sampled to 8000 Hz, then saved as an unsigned, 16-bit, ‘little endian’ mono file.

Third, encode the file using the IMA ADPCM algorithm and save as a binary file. Fourth, collect all the files together in a file system. Finally, store the files into the microcontroller or external memory.

The hardware for this system is shown in Figure 2. The microcontroller addresses the voice file for playback from memory and decodes the file using the PWM module.

The output of the PWM module is low-pass filtered at a 4000 Hz band pass. The resulting analog signal can be amplified and played through a speaker.

With a little effort in recording voices, encoding them in ADPCM format and storing them in memory, an embedded project can indeed have a natural voice. But it doesn’t stop there.

Since the files are merely recordings, chimes, tones and buzzing sounds can be introduced. The only limit is your imagination. Now, go ahead and enhance the user experience of your next project.

Written by Steven Bible, principal applications engineer, Microchip Technology