Introducing Speex

Speex is an open source patent-free voice codec that has been developed for high definition voice over IP (HD VoIP). It provides high quality speech and it is robust to lost packets using CELP (Code-excited linear prediction) encoding technique.

Speex has been designed to compress voice at bitrates ranging from 2 to 44 kbps. The Speex Project aims to lower the barrier of entry for voice applications by providing a free alternative to expensive proprietary speech codecs. Speex is a powerful codec due to its flexibility. It is well-adapted to Internet applications and provides useful features that are not present in most other codecs. Speex is part of the GNU Project and is available under the Xiph.org variant of the BSD license. It may be used with the Ogg container format or directly transmitted over UDP/RTP.

(PLC) is a technique to mask the effects of packet loss in VoIP communications. Because the voice signal is sent as packets on a VoIP network, they may travel different routes to get to destination. At the receiver a packet might arrive very late, corrupted or simply might not arrive. One of the cases in which the last situation could happen is where a packet is rejected by a server which has a full buffer and cannot accept any more data. In a VoIP connection, error-control techniques such as ARQ are not feasible and the receiver should be able to cope with packet loss.

  1. Variable bitrate operation (VBR)
    VBR allows a codec to change its bit rate dynamically to adapt to the "difficulty" of the audio being encoded. In the example of Speex, sounds like vowels and high-energy transients require a higher bit rate to achieve good quality, while fricatives (e.g. s and f sounds) can be coded adequately with fewer bits. For this reason, VBR can achieve lower bit rate for the same quality, or a better quality for a certain bit rate.

  2. Voice Activity Detection (VAD)
    When enabled, voice activity detection detects whether the audio being encoded is speech or silence/background noise. Speex detects non-speech periods and encodes them with just enough bits to reproduce the background noise. This is called "comfort noise generation" (CNG).

  3. Discontinuous Transmission (DTX)
    Discontinuous transmission is an addition to VAD/VBR operation, that allows to stop transmitting completely when the background noise is stationary. In a file, 5 bits are used for each missing frame (corresponding to 250 bit/s).

  4. Algorithmic delay
    Every codec introduces a delay in the transmission. For Speex, this delay is equal to the frame size, plus some amount of "look-ahead" required to process each frame. In narrowband operation (8 kHz), the delay is 30 ms, while for wideband (16 kHz), the delay is 34 ms. These values don't account for the CPU time it takes to encode or decode the frames.

Speex has been optimized for high quality speech and low bit rate in a way that it uses multiple bit rates and supports ultra-wideband (32 kHz sampling rate), wideband (16 kHz sampling rate) and narrowband (telephone quality, 8 kHz sampling rate).

Technology

  • 3 different sampling rates: narrowband (8 kHz), wideband (16 kHz), and ultra-wideband (32 kHz) compression in the same bitstream
  • Intensity stereo encoding
  • Packet loss concealment
  • Variable bitrate operation (VBR)
  • Voice Activity Detection (VAD)
  • Discontinuous Transmission (DTX)
  • Fixed-point port
  • Acoustic echo canceller
  • Noise suppression
  • Possible to vary encoder complexity dynamically through adjusting how lookup is performed
  • Perceptual enhancement by decoder which enhances sound quality subjectively
  • Algorithmic delay is 30ms in narrowband mode, 34ms in wideband mode)
  • Quality: Speex encoding is controlled most of the time by a quality parameter that ranges from 0 to 10.
  • Filename extension: .spx
  • Internet media type: audio/speex, audio/ogg

Features

  • Narrowband (8 kHz), wideband (16 kHz), and ultra-wideband (32 kHz) compression in the same bitstream
  • Intensity stereo encoding
  • Fixed-point port
  • Acoustic echo canceller
  • Noise suppression
  • Packet loss concealment
  • Free software/open-source, patent and royalty-free
  • MIPS/memory requirements for various platforms are available
  • PSQM/PSQM+ values under different network conditions are also available.
  • Wide range of bit-rates available (from 2 kbps to 44 kbps)
  • Standard bit rates of 2.15, 3.95, 5.95, 8, 11, 15, 18.2 and 24.6 kbps
  • Dynamic bit-rate switching and Variable Bit-Rate (VBR)
  • Variable complexity
  • Ultra-wideband mode at 32 kHz (up to 48 kHz)
  • Intensity stereo encoding option
  • Code Excited Linear Prediction (CELP) based
  • Optimized for high performance on leading edge DSP architectures
  • Multichannel implementation
  • Multi-tasking environment compatible

Application

Especially in VoIP technology.

Summary for Speex codec

Algorithm Sample Rate Bit rate Bits per sample Latency CBR VBR Stereo Multi -
channel
Lossy, Speech 8, 16, 32 (48) kHz 2.15 to 24.6 kbit/s (NB); 4 to 44.2 kbit/s (WB) 16 bit 30ms (NB) 34ms (WB) Yes Yes Yes Yes

Related Pages

More information