The Adaptive Multi-Rate speech codec
The AMR codec is a multi-mode codec with 8 narrow band speech modes with bit rates between 4.75 and 12.2 kbps. The sampling frequency is 8000 Hz and processing is done on 20 ms frames, i.e. 160 samples per frame. The AMR modes are closely related to each other and use the same coding framework
Below three of the AMR modes are already adopted
PDC-EFR : 6.7Kbps
IS-641 : 6.7 Kbps
GSM-EFR : 12.2Kbps
The Adaptive Multi-Rate Wideband speech codec
The AMR-WB codec is a multi-mode speech codec with 9 wideband speech coding modes with bit-rates between 6.6 and 23.85 kbps. The sampling frequency is 16000 Hz and processing is performed on 20 ms frames, i.e. 320 speech samples per frame. The AMR-WB modes are closely related to each other and employ the same coding framework.
Common Characteristics for AMR and AMR-WB
The multi-mode feature is used to preserve high speech quality under a wide range of transmission conditions. In mobile radio systems (e.g. GSM) mode adaptation allows the system to adapt the balance between speech coding and error protection to enable best possible speech quality in prevailing transmission conditions. Mode adaptation can also be utilized to adapt to the varying available transmission bandwidth. Every codec implementation MUST support all specified speech coding modes. The codecs can handle mode switching to any mode at any time, but some transport systems have limitations in the number of supported modes and on how often the mode can change. The mode information must therefore be transmitted together with the speech encoded bits, to indicate the mode. To realize rate adaptation the decoder needs to signal the mode it prefers to receive to the encoder. It is RECOMMENDED that the encoder follows a received mode request, but if the encoder has reason for not follow the mode request, e.g. congestion control, it may use another mode. No codec mode request MUST be sent for packets sent to a multicast group, and the encoder in the sender SHOULD ignore mode requests when sending to a multicast session but MAY use RTCP feedback information as a hint that a mode change is needed.
Both codecs include voice activity detection (VAD) and generation of comfort noise (CN) parameters during silence periods. Hence, the codecs have the option to reduce the number of transmitted bits and packets during silence periods to a minimum. The operation to send CN parameters at regular intervals during silence periods is usually called discontinuous transmission (DTX) or source controlled rate (SCR) operation. The frames containing CN parameters are called Silence Indicator (SID) frames.
references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt
The AMR codec is a multi-mode codec with 8 narrow band speech modes with bit rates between 4.75 and 12.2 kbps. The sampling frequency is 8000 Hz and processing is done on 20 ms frames, i.e. 160 samples per frame. The AMR modes are closely related to each other and use the same coding framework
Below three of the AMR modes are already adopted
PDC-EFR : 6.7Kbps
IS-641 : 6.7 Kbps
GSM-EFR : 12.2Kbps
The Adaptive Multi-Rate Wideband speech codec
The AMR-WB codec is a multi-mode speech codec with 9 wideband speech coding modes with bit-rates between 6.6 and 23.85 kbps. The sampling frequency is 16000 Hz and processing is performed on 20 ms frames, i.e. 320 speech samples per frame. The AMR-WB modes are closely related to each other and employ the same coding framework.
Common Characteristics for AMR and AMR-WB
The multi-mode feature is used to preserve high speech quality under a wide range of transmission conditions. In mobile radio systems (e.g. GSM) mode adaptation allows the system to adapt the balance between speech coding and error protection to enable best possible speech quality in prevailing transmission conditions. Mode adaptation can also be utilized to adapt to the varying available transmission bandwidth. Every codec implementation MUST support all specified speech coding modes. The codecs can handle mode switching to any mode at any time, but some transport systems have limitations in the number of supported modes and on how often the mode can change. The mode information must therefore be transmitted together with the speech encoded bits, to indicate the mode. To realize rate adaptation the decoder needs to signal the mode it prefers to receive to the encoder. It is RECOMMENDED that the encoder follows a received mode request, but if the encoder has reason for not follow the mode request, e.g. congestion control, it may use another mode. No codec mode request MUST be sent for packets sent to a multicast group, and the encoder in the sender SHOULD ignore mode requests when sending to a multicast session but MAY use RTCP feedback information as a hint that a mode change is needed.
Both codecs include voice activity detection (VAD) and generation of comfort noise (CN) parameters during silence periods. Hence, the codecs have the option to reduce the number of transmitted bits and packets during silence periods to a minimum. The operation to send CN parameters at regular intervals during silence periods is usually called discontinuous transmission (DTX) or source controlled rate (SCR) operation. The frames containing CN parameters are called Silence Indicator (SID) frames.
references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt
No comments:
Post a Comment