Sunday, April 16, 2017

AMRNB and AMRWB codecs as described by IETF

The Adaptive Multi-Rate speech codec

The AMR codec is a multi-mode codec with 8 narrow band speech modes with bit rates between 4.75 and 12.2 kbps. The sampling frequency is 8000 Hz and processing is done on 20 ms frames, i.e. 160 samples per frame. The AMR modes are closely related to each other and use the same coding framework

Below three of the AMR modes are already adopted

PDC-EFR : 6.7Kbps
IS-641 : 6.7 Kbps
GSM-EFR : 12.2Kbps

The Adaptive Multi-Rate Wideband speech codec

The AMR-WB codec is a multi-mode speech codec with 9 wideband speech  coding modes with bit-rates between 6.6 and 23.85 kbps. The sampling frequency is 16000 Hz and processing is performed on 20 ms frames, i.e. 320 speech samples per frame. The AMR-WB modes are closely related to each other and employ the same coding framework.

Common Characteristics for AMR and AMR-WB

The multi-mode feature is used to preserve high speech quality under a wide range of transmission conditions. In mobile radio systems (e.g. GSM) mode adaptation allows the system to adapt the balance  between speech coding and error protection to enable best possible speech quality in prevailing transmission conditions. Mode adaptation  can also be utilized to adapt to the varying available transmission bandwidth. Every codec implementation MUST support all specified speech coding modes.  The codecs can handle mode switching to any mode at any time, but some transport systems have limitations in the  number of supported modes and on how often the mode can change. The  mode information must therefore be transmitted together with the speech encoded bits, to indicate the mode. To realize rate adaptation  the decoder needs to signal the mode it prefers to receive to the  encoder. It is RECOMMENDED that the encoder follows a received mode request, but if the encoder has reason for not follow the mode  request, e.g. congestion control, it may use another mode. No codec mode request MUST be sent for packets sent to a multicast group, and the encoder in the sender SHOULD ignore mode requests when sending to  a multicast session but MAY use RTCP feedback information as a hint that a mode change is needed.

Both codecs include voice activity detection (VAD) and generation of comfort noise (CN) parameters during silence periods. Hence, the  codecs have the option to reduce the number of transmitted bits and  packets during silence periods to a minimum. The operation to send CN parameters at regular intervals during silence periods is usually called discontinuous transmission (DTX) or source controlled rate (SCR) operation. The frames containing CN parameters are called Silence Indicator (SID) frames.

references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

No comments:

Post a Comment