Friday, April 28, 2017

Testing Opencore decoder

Once compiled, the test executable is in the folder /test/amrnb-dec
the program input is amr file and the output is wav file.


./amrnb-dec outamr8khz.amr outamr8khz.wav

Looking at the code, after decoding the stream, it needs to be converted to little endian for
wav format.

while (1) {
uint8_t buffer[500], littleendian[320], *ptr;
int size, i;
int16_t outbuffer[160];
/* Read the mode byte */
n = fread(buffer, 1, 1, in);
if (n <= 0)
break;
/* Find the packet size */
size = sizes[(buffer[0] >> 3) & 0x0f];
n = fread(buffer + 1, 1, size, in);
if (n != size)
break;

/* Decode the packet */
Decoder_Interface_Decode(amr, buffer, outbuffer, 0);

/* Convert to little endian and write to wav */
ptr = littleendian;
for (i = 0; i < 160; i++) {
*ptr++ = (outbuffer[i] >> 0) & 0xff;
*ptr++ = (outbuffer[i] >> 8) & 0xff;
}
wav_write_data(wav, littleendian, 320);
}

Opencore AMR - How to build and test and Verify encode and decode functionality

the build system is based on GNU. on the downloaded opencore-amr folder run the following two

./configure
make

output of ./configure
My-MacBook-Pro:opencore-amr retheeshravi$ ./configure 
checking for a BSD-compatible install... /usr/local/bin/ginstall -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/local/bin/gmkdir -p
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking whether make sets $(MAKE)... yes
checking how to create a ustar tar archive... gnutar
checking whether to enable maintainer-specific portions of Makefiles... no
checking build system type... i386-apple-darwin15.5.0
checking host system type... i386-apple-darwin15.5.0
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking for style of include used by make... GNU
checking dependency style of g++... gcc3
checking for gcc... gcc
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking dependency style of gcc... gcc3
checking whether ln -s works... yes

checking whether make sets $(MAKE)... (cached) yes


Output of make
Make will make the executables based on the output from configure script. 
my-MacBook-Pro:opencore-amr retheeshravi$ make
/Users/mymacbook/Downloads/Xcode.app/Contents/Developer/usr/bin/make  all-recursive
Making all in amrnb
make[2]: Nothing to be done for `all'.
Making all in amrwb
make[2]: Nothing to be done for `all'.
Making all in test
  CC     amrwb-dec.o
  CC     wavwriter.o
  CCLD   amrwb-dec
  CC     amrnb-dec.o
  CCLD   amrnb-dec
  CC     amrnb-enc.o
  CC     wavreader.o
  CCLD   amrnb-enc
  CC     linkboth.o
  CCLD   linkboth
  CC     amrnb-enc-sine.o
  CCLD   amrnb-enc-sine

Now we can run the amrnb-dec, amrnb-enc programs independantly and get the results. 

references:

Thursday, April 27, 2017

C Search and Replace

A sample C code to search and replace string in a text

void str_replace(char *target, const char *needle, const char *replacement)
{
    char buffer[1024] = { 0 };
    char *insert_point = &buffer[0];
    const char *tmp = target;
    size_t needle_len = strlen(needle);
    size_t repl_len = strlen(replacement);
    
    while (1) {
        const char *p = strstr(tmp, needle);
        
        // walked past last occurrence of needle; copy remaining part
        if (p == NULL) {
            strcpy(insert_point, tmp);
            break;
        }
        
        // copy part before needle
        memcpy(insert_point, tmp, p - tmp);
        insert_point += p - tmp;
        
        // copy replacement string
        memcpy(insert_point, replacement, repl_len);
        insert_point += repl_len;
        
        // adjust pointers, move on
        tmp = p + needle_len;
    }
    
    // write altered string back to target
    strcpy(target, buffer);

}

C++ how to get the absolute file path from iOS Cache directory

With code below I able to access cache folder in my app. I think, documents folder in "/Library/Documents", or somewhere else.

char *home = getenv("HOME");
char *subdir = "/Library/Caches/subdir";

char fullpath[200];
strcpy(fullpath,home);

strcat(fullpath, subdir);

Next, with full path, you can do usual things to read/write in C++

references:
http://stackoverflow.com/questions/13469342/using-c-to-access-documents-folder-on-ios

A collection of RTP tools

There are a few useful tools here.

rtpplay
tpplay reads RTP session data, recorded by rtpdump -F dump from either the file or stdin, if file is not specified, sending it to network address destination and port port with a time-to-live value of ttl.

If the flag -T is given, the timing between packets corresponds to the arrival timing rather than the RTP timestamps. Otherwise, for RTP data packets, the timing given by the RTP timestamps is used, smoothing interarrival jitter and restoring packet sequence. RTCP packets are still sent with their original timing. This may cause the relative order of RTP and RTCP packets to be changed.

The source port(localport) for outgoing packets can be set with the -s flag. A random port is chosen if this flag is not specified.

The whole file is played unless the begin or end times are specified. Times are measured in seconds and fractions from the beginning of the recording.

The RTP clock frequency is read from the profile file if given; the default profile (RFC 1890) is used if not. The profile file contains lines with two fields each: the first is the numeric payload type, the second the clock frequency. The values read from the profile file are silently ignored if the -T flag is used.

If you want to loop a particular file, it is easiest to put the rtpplay command in a shell script.

The -v flag has rtpplay display the packets generated on stdout.

rtpplay uses the hsearch (3C) library, which may not be available on all operating systems.

rtpdump

rtpdump [-F format] [-t duration] [-x bytes] [-f file] [-o outputfile] address/port
rtpdump listens on the address and port pair for RTP and RTCP packets and dumps a processed version to outputfile if specified or stdout otherwise.

If file is specified, the file is used instead of the network address. If no network address is given, file input is expected from stdin. The file must have been recorded using the rtpdump dump format.

The recording duration is measured in minutes.

From each packet, only the first bytes of the payload are dumped (only applicable for "dump" and "hex" formats).

rtpsend

rtpsend sends an RTP packet stream with configurable parameters. This is intended to test RTP features. The RTP or RTCP headers are read from a file, generated by hand, a test program or rtpdump (format "ascii").

rtpsend [-a] [-l] [-s sourceport] [-f file] destination/port[/ttl]

Packets are sent with a time-to-live value ttl.

If data is read from a file instead of stdin, the -l(loop) flag resends the same sequence of packets again and again.

The source port(localport) for outgoing packets can be set with the -s flag. A random port is chosen if this flag is not specified.

If the -a flag is specified, rtpsend includes a router alert IP option in RTCP packets. This is used by the YESSIR resource reservation protoccol.

The file file contains the description of the packets to be sent. Within the file, each entry starts with a time value, in seconds, relative to the beginning of the trace. The time value must appear at the beginning of a line, without white space. Within an RTP or RTCP packet description, parameters may appear in any order, without white space around the equal sign. Lines are continued with initial white space on the next line. Comment lines start with #. Strings are enclosed in quotation marks.

rtptrans

tptrans RTP/RTCP packets arriving from one of the addresses to all other addresses. Addresses can be a multicast or unicast. TTL values for unicast addresses are ignored. (Actually, doesn't check whether packets are RTP or not.)

Additionally, the translator can translate VAT packets into RTP packets. VAT control packets are translated into RTCP SDES packets with a CNAME and a NAME entry. However, this is only intended to be used in the following configuration: VAT packets arriving on a multicast connection are translated into RTP and sent over a unicast link. RTP packets are not (yet) translated into VAT packets and and all packets arriving on unicast links are not changed at all. Therefore, currently mainly the following topology is supported: multicast VAT -> translator -> unicast RTP; and on the way back it should lokk like this multicast VAT <- able="" agent="" and="" audio="" be="" both="" link="" means="" on="" p="" rtp.="" should="" that="" the="" this="" translator="" unicast="" use="" vat.="" vat="">
references:
http://www.cs.columbia.edu/irt/software/rtptools/

Tuesday, April 25, 2017

C commands to create Object file, executables

Below are some basic steps

 g++ -c my_lib.cpp => gives output as my_lib.o
 gcc -c my_lib.c => gives output as  my_lib.o

to link my_source.c with object my_lib.o file, below to be used.
gcc my_source.c my_lib.o

gcc 

Saturday, April 22, 2017

Extern Keyword in C

extern allows one module of your program to access a global variable or function declared in another module of your program. You usually have extern variables declared in header files. extern is used to let other C files or external components know this variable is already defined somewhere.

Using extern is only of relevance when the program you're building consists of multiple source files linked together, where some of the variables defined, for example, in source file file1.c need to be referenced in other source files, such as file2.c.

It is important to understand the difference between defining a variable and declaring a variable:

1) A variable is defined when the compiler allocates the storage for the variable.

2) A variable is declared when the compiler is informed that a variable exists (and this is its type); it does not allocate the storage for the variable at that point.

You may declare a variable multiple times (though once is sufficient); you may only define it once within a given scope.

Best way to declare and define global variables

Although there are other ways of doing it, the clean, reliable way to declare and define global variables is to use a header file file3.h to contain an extern declaration of the variable. The header is included by the one source file that defines the variable and by all the source files that reference the variable. For each program, one source file (and only one source file) defines the variable. Similarly, one header file (and only one header file) should declare the variable.

file3.h
extern int global_variable;  /* Declaration of the variable */

file1.c
#include "file3.h"  /* Declaration made available here */
#include "prog1.h"  /* Function declarations */

/* Variable defined here */
int global_variable = 37;    /* Definition checked against declaration */
int increment(void) { return global_variable++; }

file2.c
#include "file3.h"
#include "prog1.h"
#include

void use_it(void)
{
    printf("Global variable: %d\n", global_variable++);
}

prog1.h
extern void use_it(void);
extern int increment(void);

prog1.c
#include "file3.h"
#include "prog1.h"
#include

int main(void)
{
    use_it();
    global_variable += 19;
    use_it();
    printf("Increment: %d\n", increment());
    return 0;
}

references:
http://stackoverflow.com/questions/1433204/how-do-i-use-extern-to-share-variables-between-source-files-in-c

Wednesday, April 19, 2017

Why did not wireshark show up the RTP packets instead shown as UDP?

This was when doing the AMR codec tests. Thought since AMR is licenced codec, RTP also not shown as RTP packets, however, turned out to be the configuration which is in screenshot below.

Analyze > Enabled Protocols

What is C++11 ?

C++11 is a version of the standard for the programming language C++. It was approved by International Organization for Standardization (ISO) on 12 August 2011, replacing C++03, superseded by C++14 on 18 August 2014 and later, by C++17, which is still under development.

After the approval of the C++ standard in 1998, two committee members prophesied that the next C++ standard would “certainly” include a built-in garbage collector (GC), and that it probably wouldn’t support multithreading because of the technical complexities involved in defining a portable threading model. Thirteen years later, the new C++ standard, C++11, is almost complete. Guess what? It lacks a GC but it does include a state-of–the-art threading library.


references:
http://blog.smartbear.com/c-plus-plus/the-biggest-changes-in-c11-and-why-you-should-care/

Tuesday, April 18, 2017

Simple payload sorting

If multiple new frames are encapsulated into the payload and robust payload sorting is not used, the payload is formed by concatenating the payload header, the ToC, optional CRC fields and the speech frames in the payload. However, the bits inside a frame are ordered into sensitivity order as defined in [2] for AMR and [4] for AMR-WB.

Simple payload sorting for bandwidth efficient operation

Simple payload sorting algorithm sorts Payload header, table of contents, payload frames and does padding and map into octets

/* payload header */
   k=0; H=4;
   for (i = 0; i < H; i++){
     b(k++) = h(i);
   }
   /* table of contents */
   T=6;
   for (j = 0; j < N; j++){
     for (i = 0; i < T; i++){
       b(k++) = t(j,i);
     }
   }
   /* payload frames */
   for (j = 0; j < N; j++){
     for (i = 0; i < F(j); i++){
       b(k++) = f(j,i);
     }
   }
   /* padding */
   S = (k%8 == 0) ? 0 : 8 - k%8;
   for (i = 0; i < S; i++){
     b(k++) = 0;
   }
   /* map into octets */
   for (i = 0; i < k; i++){
     o(i/8,i%8)=b(i)
   }

Simple payload sorting for octet aligned operation
/* payload header */
   k=0; H=8;
   if (interleaving){
     H+=8;       /* Interleaving extension */
   }
   for (i = 0; i < H; i++){
     b(k++) = h(i);
   }

 /* table of contents */
   T=8;
   for (j = 0; j < N; j++){
     for (i = 0; i < T; i++){
       b(k++) = t(j,i);
     }
   }

   /* CRCs, only if signaled */
   if (crc) {
     for (j = 0; j < N; j++){
       for (i = 0; i < C(j); i++){
         b(k++) = p(j,i);
       }
     }
   }
/* payload frames */
   for (j = 0; j < N; j++){
     for (i = 0; i < F(j); i++){
       b(k++) = f(j,i);
     }
     /* padding of each speech frame */
     S = (k%8 == 0) ? 0 : 8 - k%8;
     for (i = 0; i < S; i++){
       b(k++) = 0;
     }
   }
   /* map into octets */
   for (i = 0; i < k; i++){
     o(i/8,i%8)=b(i)
   }

references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

AMR Compound Payload

The compound payload consists of one payload header, the table of contents and one or more speech frames

These elements SHALL be put together to form a payload with either simple or robust sorting. If the bandwidth efficient operation is used, simple sorting MUST be used.

Definitions for describing the compound payload:

b(m)    - bit m of the compound payload, octet aligned
o(n,m)  - bit m of octet n in the octet description of the compound
             payload, bit 0 is MSB
t(n,m)  - bit m in the table of contents entry for speech frame n
p(n,m)  - bit m in the CRC for speech frame n
f(n,m)  - bit m in speech frame n
F(n)    - number of bits in speech frame n, defined by FT
h(m)    - bit m of payload header
C(n)    - number of CRC bits for speech frame n, 0 or 8 bits
P(n)    - number of padding bits for speech frame n
N       - number of payload frames in the payload
S       - number of unused bits

Payload frames f(n,m) are ordered in consecutive order, where frame n is preceding frame n+1. Within one payload with multiple speech frames the sequence of speech frames MUST contain all speech frames in the sequence. If interleaving is used  the interleaving rules defined in section 2.2 applies for which frames that are contained in the payload. If speech data is missing for one or more frames in the sequence of frames in the payload, due to e.g. DTX, send the NO_DATA frame type in the ToC for these frames. This does not mean that all frames must be sent, only that the sequence of frames in one payload MUST indicate missing frames. Payloads containing only NO_DATA frames SHOULD NOT be transmitted.

The compound payload, b, is mapped into octets, o, where bit 0 is MSB.

references
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

Speech frame of AMR

A speech frame represents one frame encoded with the mode according to the ToC field FT. The length of this field is implicitly defined by the mode in the FT field. The bits SHALL be sorted according to Annex B of [2] for AMR and Annex B of [4] for AMR-WB.

If octet aligned operation is used, the last octet of each speech frame MUST be padded with zeroes at the end if not all bits are used.

references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

The payload table of contents and CRCs

The table of contents (ToC) consists of one entry for each speech frame in the payload. A table of contents entry includes several specified fields as follows:

F (1 bit): Indicates if this frame is followed by further speech frames in this payload or not. F=1 further frames follow, F=0 last frame.

FT (4 bits): Frame type indicator, indicating the AMR or AMR-WB speech coding mode or comfort noise (SID) mode.If FT=14 (speech lost, available only in AMR-WB) or FT=15 (No transmission/no reception) no CRC or payload frame is present.

Q (1 bit): The payload quality bit indicates, if not set, that the payload is severely damaged and the receiver should set the RX_TYPE, see [6], to SPEECH_BAD or SID_BAD depending on the frame type (FT).

P: Is a padding bit, MUST be set to zero.


FIGURE 7 - Table of contents entry field for bandwidth efficient operation.



Figure 8 - An example of ToC when using bandwidth efficient operation.


Figure 9 - Table of contents entry field for octet aligned operation.

CRC (8 bits): OPTIONAL field, exists if the use of CRC is signaled at session set up and SHALL only be used in octet aligned operation. The 8 bit CRC is used for error detection. The algorithm to generate these 8 parity bits are defined in section 4.1.4 in [2].


Figure 10: CRC field

The ToC and CRCs are arranged with all table of contents entries fields first followed by all CRC fields. The ToC starts with the frame data belonging to the oldest speech frame.



Figure 11: The ToC and CRCs for a payload with three speech frames when using octet aligned operation.

references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt



RTP payload header usage for AMR

The payload header consists of a 4 bit codec mode request.If octet aligned operation is used the payload header is padded to fill an octet and optionally an 8 bit interleaving header may extend the payload header. The bits in the header are specified as follows:

CMR (4 bits): Indicates Codec Mode Requested for the other communication direction. It is only allowed to request one of the speech modes of the used codec, frame type index 0..7 for AMR, see Table 1a in [2] or frame type index 0..8 for AMR-WB, see Table 1a in [4]. CMR value 15 indicates that no mode request is present, other values are for future use. It is RECOMMENDED that the encoder follows a received mode request, but if the encoder has reason for not follow the mode request, e.g. congestion control, it MAY use another mode.  The codec mode request (CMR) MUST be set to 15 for packets sent to a multicast group. The encoder in the sender SHOULD ignore mode requests when sending to a multicast session but MAY use RTCP feedback information as a hint that a mode change is needed. The codec mode selection MAY be restricted by the mode set definition at session set up. If so, the selected codec mode MUST be in the signaled mode set.

R: Is a reserved bit that MUST be set to zero. All R bits MUST be ignored by the receiver.

If the use of interleaving is signaled out of band at session set up, octet aligned operation MUST be used. When interleaving is used the payload header is extended with two 4 bit fields, ILL and ILP, used  to describe the interleaving scheme.

ILL (4 bits): OPTIONAL field that is present only if interleaving is signaled. The value of this field specifies the interleaving length used for frames in this payload.

ILP (4 bits): OPTIONAL field that is present only if interleaving is signaled. The value of this field indicates the interleaving index for frames in this payload. The value of ILP MUST be smaller than or equal to the value of ILL. Erroneous value of ILP SHOULD cause the payload to be discarded.

The value of the ILL field defines the length of an interleave group: ILL=L implies that frames in (L+1)-frame intervals are picked into the same interleaved payload, and the interleave group consists of  L+1 payloads. The size of the interleaving group is the N*(L+1), if N is the number of frames per payload. The value of ILL MUST only be changed between interleave groups. The value of ILP=p in payloads  belonging to the same group runs from 0 to L. The interleaving is meaningful only when the number of frames per payload (N) is greater than or equal to 2. All payloads in an interleave group MUST contain equally many speech frames. When N frames are transmitted in each payload of a group, the interleave group consists of payloads with sequence numbers s...s+L, and frames encapsulated into these payloads are f...f+N*(L+1)-1.

To put this in a form of an equation, assume that the first frame of an interleave group is n, the first payload of the group is s, number of frames per payload is N, ILL=L and ILP=p (p in range 0...L), the frames contained by the payload s+p are n + p + k*(L+1), where k runs from 0 to N-1. I.e.

The first packet of an interleave group: ILL=L, ILP=0
   Payload: s
   Frames: n, n+(L+1), n+2*(L+1), ..., n+(N-1)*(L+1)

The second packet of an interleave group: ILL=L, ILP=1
   Payload: s+1
   Frames: n+1, n+1+(L+1), n+1+2*(L+1), ..., n+1+(N-1)*(L+1)


The last packet of an interleave group: ILL=L, ILP=L
    Payload: s+L
    Frames: n+L, n+L+(L+1), n+L+2*(L+1), ..., n+L+(N-1)*(L+1)


references
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

Monday, April 17, 2017

RTP header usage of AMR

The RTP header marker bit (M) is used to mark (M=1) the packages containing as their first frame the first speech frame after a comfort noise period in DTX operation. For all other packets the marker bit is set to zero (M=0).

The timestamp corresponds to the sampling instant of the first sample encoded for the first frame in the packet. A frame can be either encoded speech, comfort noise parameters, NO_DATA, or SPEECH_LOST (only for AMR-WB). The timestamp unit is in samples.

The duration of one speech frame is 20 ms and the sampling frequency is 8 kHz, corresponding to 160 encoded speech samples per frame for AMR and 16 kHz corresponding to 320 samples per frame in AMR-WB. Thus, the  timestamp is increased by 160 for AMR and 320 for AMR-WB for each consecutive frame. All frames in a packet MUST be successive 20 ms frames except if interleaving is employed, then frames encapsulated into a payload MUST be picked

The payload MAY be padded using P bit in the RTP header.

references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

AMR Payload format - Part 1

The AMR and AMR-WB payload format supports transmission of multiple frames per payload, the use of fast codec mode adaptation, and  robustness against packet loss and bit errors.

The payload format consists of one payload header with an optional interleaving extension, a table of contents, optionally one CRC per payload frame and zero or more payload frames.

The payload format is either bandwidth efficient or octet aligned, the mode of operation to use has to be signalled at session establishment. Only the octet aligned format has the possibility to use the robust sorting, interleaving and CRC to make it robust to packet loss and bit errors. In the octet aligned format the payload header, table of contents entries and the payload frames are individually octet aligned to make implementations efficient, but in the bandwidth efficient format only the full payload is octet aligned. If the option to transmit a robust sorted payload is  signalled the full payload SHALL finally be ordered in descending bit  error sensitivity order to be prepared for unequal error protection or unequal error detection schemes.

Robustness against packet loss can be accomplished by using the  possibility to retransmit previously transmitted frames together with the current frame or frames. This is done by using a sliding window to group the speech frames to send in each payload, see figure 1. A packet containing redundant frames will not look different from a packet with only new frames. The receiver may receive multiple copies or versions (encoded with different modes) of a frame for a certain timestamp if no packet losses are experienced. If multiple versions of a speech frame is received, it is RECOMMENDED that the mode with the highest rate is used by the speech decoder.

references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

Sunday, April 16, 2017

AMRNB and AMRWB codecs as described by IETF

The Adaptive Multi-Rate speech codec

The AMR codec is a multi-mode codec with 8 narrow band speech modes with bit rates between 4.75 and 12.2 kbps. The sampling frequency is 8000 Hz and processing is done on 20 ms frames, i.e. 160 samples per frame. The AMR modes are closely related to each other and use the same coding framework

Below three of the AMR modes are already adopted

PDC-EFR : 6.7Kbps
IS-641 : 6.7 Kbps
GSM-EFR : 12.2Kbps

The Adaptive Multi-Rate Wideband speech codec

The AMR-WB codec is a multi-mode speech codec with 9 wideband speech  coding modes with bit-rates between 6.6 and 23.85 kbps. The sampling frequency is 16000 Hz and processing is performed on 20 ms frames, i.e. 320 speech samples per frame. The AMR-WB modes are closely related to each other and employ the same coding framework.

Common Characteristics for AMR and AMR-WB

The multi-mode feature is used to preserve high speech quality under a wide range of transmission conditions. In mobile radio systems (e.g. GSM) mode adaptation allows the system to adapt the balance  between speech coding and error protection to enable best possible speech quality in prevailing transmission conditions. Mode adaptation  can also be utilized to adapt to the varying available transmission bandwidth. Every codec implementation MUST support all specified speech coding modes.  The codecs can handle mode switching to any mode at any time, but some transport systems have limitations in the  number of supported modes and on how often the mode can change. The  mode information must therefore be transmitted together with the speech encoded bits, to indicate the mode. To realize rate adaptation  the decoder needs to signal the mode it prefers to receive to the  encoder. It is RECOMMENDED that the encoder follows a received mode request, but if the encoder has reason for not follow the mode  request, e.g. congestion control, it may use another mode. No codec mode request MUST be sent for packets sent to a multicast group, and the encoder in the sender SHOULD ignore mode requests when sending to  a multicast session but MAY use RTCP feedback information as a hint that a mode change is needed.

Both codecs include voice activity detection (VAD) and generation of comfort noise (CN) parameters during silence periods. Hence, the  codecs have the option to reduce the number of transmitted bits and  packets during silence periods to a minimum. The operation to send CN parameters at regular intervals during silence periods is usually called discontinuous transmission (DTX) or source controlled rate (SCR) operation. The frames containing CN parameters are called Silence Indicator (SID) frames.

references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

Codec Latency vs. Bandwidth Optimization

The low-bandwidth codecs are quite efficient. For example, G.729 will compress 10 milliseconds of audio to 10 bytes and G.723.1 encodes 30ms frames to 24 or 20 bytes.

However, since we send compressed audio frames as payload in RTP packets which are in turn sent over UDP, we need to consider the overhead for IP, UDP, and RTP headers. The overhead is 40 bytes per packet. This is significant when compared with the size of a compressed audio frame if we are not on a local area network and the bandwidth is limited. The table below shows the overhead for several low-bandwidth codecs.

If you want to improve bandwidth utilization, the obvious way to go is to send more frames in one RTP packet. However, as you do this, you also increase latency. If you decide to send, say, 100 milliseconds of audio in one packet. this means that you have added a latency of the same 100 milliseconds. Simply put, the first sample of the first frame arrives together with the last sample of the last frame in the packet, so the total delay is equal to the length of audio carried in the packet.

There is a recommendation that round-trip latency should not exceed approximately 300 milliseconds, otherwise people will start noticing.

When calculating the latency, you need to consider the time it takes to send a packet from one end to the another (your mileage may vary, try to use "traceroute" to get a clue) and the size of the jitter buffer of the receiving end (which can be 50-60 milliseconds worth of audio). Considering all this, I would say the reasonable maximum is to send 60 milliseconds of audio in one packet. This will result in the following bitrates:

In addition to latency, there are two more things you should consider when increasing the number of audio frames per RTP packet:

If a packet with a larger number of frames gets lost, the loss is more noticeable to the user.
With greater end-to-end delay, possible echos become more noticeable.

references:
http://toncar.cz/Tutorials/VoIP/VoIP_Basics_Codec_Latency_vs_Bandwidth.html

Overview of Voice Codecs

Once we have the audio signal represented as a sequence of samples, the next step is to compress it to reduce the consumption of network bandwidth required to transmit the speech to the receiving party.

The compression and decompression is handled by special algorithms we call codecs (COder-DECoder)

All the codecs listed below here expect the input to be audio sampled at 8 kHz with 16-bit samples.

G.711
G.711 is a codec that was introduced by ITU in 1972 for use in digital telephony, i.e. in ISDN, T.1 and E.1 links. The codec has two variants: A-Law is being used in Europe and in international telephone links, u-Law is used in the U.S.A. and Japan.

G.711 uses a logarithmic compression. It squeezes each 16-bit sample to 8 bits, thus it achieves a compression ratio of 1:2. The resulting bitrate is 64 kbit/s for one direction, so a call consumes 128 kbit/s (plus some overhead for packet headers). This is quite a lot when compared with other codecs.

This codec can be used freely in VoIP applications as there are no licensing fees. It works best in local area networks where we have a lot of bandwidth available. It's benefits include simple implementation which does not need much CPU power (can be implemented using a relatively simple table lookup) and a very good perceived audio quality - the MOS value is 4.2.

G.729

G.729 is a codec that has low bandwidth requirements but provides good audio quality (MOS = 4.0). The codec encodes audio in frames, each frame is 10 milliseconds long. Given the sampling frequency of 8 kHz, the 10 ms frame contains 80 audio samples. The codec algorithm encodes each frame to 10 bytes, so the resulting bitrate is 8 kbit/s for one direction.

When used in VoIP, we usually send 3-6 G.729 frames in each packet. We do this because the overhead of packet headers (IP, UDP, and RTP together) is 40 bytes and we want to improve the ratio of "useful" information.


G.729 is a licensed codec. As far as end users are concerned, the easiest path to using it is to buy a hardware that implements it (be it a VoIP phone or gateway). In such case, the licensing fee has already been paid by the producer of the chip used in the device.

A frequently used variant of G.729 is G.729a. It is wire-compatible with the original codec but has lower CPU requirements.


G.723.1

G.723.1 is a result of a competition that ITU announced with the aim to design a codec that would allow calls over 28.8 and 33 kbit/s modem links. There were two very good solutions and ITU decided to use them both. Because of that, we have two variants of G.723.1. They both operate on audio frames of 30 milliseconds (i.e. 240 samples), but the algorithms differ. The bitrate of the first variant is 6.4 kbit/s and the MOS is 3.9. The bitrate of the second variant is 5.3 kbit/s with MOS=3.7. The encoded frames for the two variants are 24 and 20 bytes long, respectively.

G.723.1 is a licensed codec, the last patent that covers it is expected to expire in 2014.

GSM 06.10
(also known as GSM Full Rate) is a codec designed by the European Telecommunications Standards Institute for use in the GSM mobile networks. This variant of the GSM codec can be freely used so you will often find it in open source VoIP applications. The codec operates on audio frames 20 milliseconds long (i.e. 160 samples) and it compresses each frame to 33 bytes, so the resulting bitrate is 13 kbit/s (to be precise, the encoded frame is exactly 32 and 1/2 byte, so 4 bits are unused in each frame). The codec's Mean Opinion Score is 3.7.

Speex
Speex is an open source patent-free codec designed by the Xiph.org Foundation. It is designed to work with sampling rates of 8 kHz, 16 kHz, and 32 kHz and can compress audio signal to bitrates between 2 and 44 kbit/s. For use in VoIP telephony, the most usual choice is the 8 kHz (narrow band) variant.

iLBC
iLBC (internet Low Bit Rate Codec) is a free codec developed by Global IP Solutions (later acquired by Google). The codec is defined in RFC3951. With iLBC, you can choose to use either 20 ms or 30 ms frames and the resulting bitrate is 15.2 kbit/s and 13.33 kbit/s, respectively. Much like Speex and GSM 06.10, you will find iLBC in many open source VoIP applications.


references:
http://toncar.cz/Tutorials/VoIP/VoIP_Basics_Overview_of_Audio_Codecs.html

MOS (Mean Opinion Score) value of a Codec

Previous article, we discussed about sampling of audio. Now having the audio signal represented as a sequence of samples, the next step is to compress it to reduce the consumption of network bandwidth required to transmit the speech to the receiving party.

The compression and decompression is handled by special algorithms we call codecs (COder-DECoder). Let's have a look at some popular codecs that are being used in Voice over IP. All the codecs we list here expect the input to be audio sampled at 8 kHz with 16-bit samples.

In the text below, we will mention the MOS of the codecs. MOS stands for "Mean Opinion Score". MOS measures the perceived quality of audio after it has been compresses by the particular codec, transmitted, and decompressed. The score is assigned by a group of listeners using the procedure specified in ITU-T standards P.800 and P.830. The interpretation of individual MOS values is as follows:

5 => Excellent
4 => Good
3 => Fair
2 => Poor
1 => Bad

G.711 MOS value = 4.2
G.729 MOS value is 4.0
G.723.1 variant 1 has MOS 3.9 and variant 2 has MOS 3.5
GSM 06.10 MOS is 3.7

references:
http://toncar.cz/Tutorials/VoIP/VoIP_Basics_Overview_of_Audio_Codecs.html 

VoIP Sampling and Quantization

When converting an analog signal (be it speech or another noise), you need to consider two important factors: sampling and quantization. Together, they determine the quality of the digitized sound.

Sampling is about the sampling rate — i.e. how many samples per second you use to encode the sound.

Quantization is about how many bits you use to represent each sample. The number of bits determines the number of different values you can represent with each sample.

Quantization

quantization is about how many bits you use to represent individual sound samples. In practice, we want to work with whole bytes, so let's consider 8 or 16 bits.

With 8-bit samples, each sample can represent 256 different values, so we can work with whole numbers between -128 and +127. Because of the whole numbers, it is inevitable that we introduce some noise into the signal as we convert it to digital samples. For example, if the exact analog value is "7.44125", we will represent it as "7". As we do this with each sample in the sequence, we slightly distort the signal — inject noise, in other words.

It turns out 8-bit samples do not result in a good quality. With only 256 sample values, the analog-to-digital conversion adds too much noise. The situation improves a lot if we switch to 16-bit samples as 16 bits give us 65536 different representations (from -32768 to +32767). 16-bit samples are what you will find on a CD and what VoIP codecs use as their input.


Sampling

With VoIP, you will most frequently encounter the sampling rate of 8 kilohertz. The frequency of 16 kHz can be used now and then in situations when a higher quality audio is required (with proportionally higher Internet bandwidth consumption).

The choice of sampling frequencies for the individual types of audio is not random. There is a rule (based on the work of Nyquist and Shanon) that the sampling frequency needs to be equal or greater than two times the transmitted bandwidth.

In short, It's good to remember that VoIP most frequently works with the sampling frequency of 8 kilohertz and each sample is stored in 16 bits.

references:
http://toncar.cz/Tutorials/VoIP/VoIP_Basics_Converting_Voice_to_Digital.html

Saturday, April 15, 2017

WebRTC Architecture - A recap

Its been a long time looked into the WebRTC Stack. Never noticed earlier that NetEQ was part of the stack for Jitter buffer optimization. Here is the full stack.


WebRTC Native C++ API
An API layer that enables browser makers to easily implement the Web API proposal.


Transport / Session
The session components are built by re-using components from libjingle, without using or requiring the xmpp/jingle protocol.

RTP Stack
A network stack for RTP, the Real Time Protocol.

STUN/ICE
A component allowing calls to use the STUN and ICE mechanisms to establish connections across various types of networks.

Session Management
An abstracted session layer, allowing for call setup and management layer. This leaves the protocol implementation decision to the application developer.

VoiceEngine
VoiceEngine is a framework for the audio media chain, from sound card to the network.

iSAC / iLBC / Opus

iSAC: A wideband and super wideband audio codec for VoIP and streaming audio. iSAC uses 16 kHz or 32 kHz sampling frequency with an adaptive and variable bit rate of 12 to 52 kbps.

iLBC: A narrowband speech codec for VoIP and streaming audio. Uses 8 kHz sampling frequency with a bitrate of 15.2 kbps for 20ms frames and 13.33 kbps for 30ms frames. Defined by IETF RFCs 3951 and 3952.

Opus: Supports constant and variable bitrate encoding from 6 kbit/s to 510 kbit/s, frame sizes from 2.5 ms to 60 ms, and various sampling rates from 8 kHz (with 4 kHz bandwidth) to 48 kHz (with 20 kHz bandwidth, where the entire hearing range of the human auditory system can be reproduced). Defined by IETF RFC 6176. NetEQ for Voice

A dynamic jitter buffer and error concealment algorithm used for concealing the negative effects of network jitter and packet loss. Keeps latency as low as possible while maintaining the highest voice quality.

Acoustic Echo Canceler (AEC)
he Acoustic Echo Canceler is a software based signal processing component that removes, in real time, the acoustic echo resulting from the voice being played out coming into the active microphone.

Noise Reduction (NR)
The Noise Reduction component is a software based signal processing component that removes certain types of background noise usually associated with VoIP. (Hiss, fan noise, etc…)

VideoEngine
VideoEngine is a framework video media chain for video, from camera to the network, and from network to the screen.

VP8
Video codec from the WebM Project. Well suited for RTC as it is designed for low latency.

Video Jitter Buffer
Dynamic Jitter Buffer for video. Helps conceal the effects of jitter and packet loss on overall video quality.

Image enhancements
For example, removes video noise from the image capture by the webcam.

references:
https://webrtc.org/architecture/

What is WebRTC NetEQ

A dynamic jitter buffer and error concealment algorithm used for concealing the negative effects of network jitter and packet loss. Keeps latency as low as possible while maintaining the highest voice quality.

NetEQ is an implementation of a Jitter Buffer optimised for voice.

references:
https://webrtcglossary.com/neteq/

Friday, April 14, 2017

Packet Loss, Reordering, Jitter simple image explanation

In the below case packet #2 is lost




In the case below, #2 is out of order received




In thie case each packet is sent in 20 ms interval. But received in different intervals. #1 received almost in 20 ms, #2,#2,#4 received altogether 60 ms later. #5 was almost reaching within 20ms.


references:
https://webrtcglossary.com/jitter-buffer/

VoIP bandwidth consumption of various codecs

VoIP Bandwidth consumption naturally depends on the codec used.
When calculating bandwidth, one can't assume that every channel is used all the time. Normal conversation includes a lot of silence, which often means no packets are sent at all. So even if one voice call sets up two 64 Kbit RTP streams over UDP over IP over Ethernet (which adds overhead), the full bandwidth is not used at all times.

A codec that sends a 64kb stream results in a much larger IP network stream. The main cause of the extra bandwidth usage is IP and UDP headers. VoIP sends small packets and so, many times, the headers are actually much larger than the data part of the packet.

The bandwidth used depends also on the datalink (layer2) protocols. Several things influence the bandwidth used, payload size, ATM cell headers, VPN headers, use of header compression etc.

Teracall has the table which shows how the codec's theoretical bandwidth usage expands with UDP/IP headers:

Codec BR NEB
G.711 64 Kbps 87.2 Kbps
G.729 8 Kbps 31.2 Kbps
G.723.1 6.4 Kbps 21.9 Kbps
G.723.1 5.3 Kbps 20.8 Kbps
G.726 32 Kbps 55.2 Kbps
G.726 24 Kbps 47.2 Kbps
G.728 16 Kbps 31.5 Kbps
iLBC 15 Kbps 27.7 Kbps

Constructor initialization order

Class C {
   int a;
   int b;
   C():b(1),a(2){} //warning, should be C():a(2),b(1)
}

In this case the order should be a(2), b(1) because the order in which declared in the class is the same
or we can turn on -Wno-reorder

references
http://stackoverflow.com/questions/1564937/gcc-warning-will-be-initialized-after

A good explanation of VoIP packet size

The best explanation is in chapter 5 of the Authorized Self-Study Guide for Cisco Voice over IP (CVoice) Second Edition by Kevin Wallace.  Basically, the formula is Bytes_per_sample = Sample_Size * CODEC_Bandwidth / 8 plus overhead.
So, if your sample size is 20 ms (.02 seconds)  and you are using the G.711 CODEC, then your basic voice information requires
.02 * 64000 / 8 = 160 bytes per sample.  To that, you must add the overhead, which would be 20 bytes for the IP header,
8 bytes for the UDP header, and 12 bytes for the RTP header (to make sure your packets are in the correct order at the receiving end).
So, each voice packet will require 200 bytes.  Then, you need to add your Layer 2 overhead (at least 18 bytes for Ethernet).
So, each frame will require at least 218 bytes.  And you may also have trunk or tunneling overhead to consider.

Yes, voice packets are small, but you need a lot of them to carry voice.  That's why we use compression techniques.
cRTP compresses the IP/UDP/RTP header to either 2 or 4 bytes (4 if you implement check sum).  This significantly reduces the overhead.

In addition, you can use a CODEC that requires lower bandwidth.  The G.729 CODEC only requires 8000 bits per second, so if you used
G.729 and RTP header compression you would get
.02 * 8000 / 8 = 20 bytes of voice information plus 4 bytes for IP/UDP/RTP for an IP packet size of 24 bytes.  Then, you can add your
Layer 2 overhead, whatever it is.

references:
https://learningnetwork.cisco.com/thread/11162

Thursday, April 13, 2017

VoIP why multiple channels required ?

Any client assessment of VoIP should start with bandwidth capacity planning. This is particularly important because VoIP voice quality degrades quickly with contention from other applications. The goal is to gauge the total bandwidth on the client's network, estimate the current bandwidth utilization of applications, decide if there is enough remaining (unused) bandwidth to sustain the maximum number of planned voice channels (roughly 64 kbps per channel), and try to predict the amount of bandwidth needed by applications or users into the foreseeable future.

"Network capacity becomes more a measure of how many simultaneous calls the network can process," Zuk said. "This concept of peak load -- the maximum assumed volume that the network should be able to handle -- will be the basis of VoIP capacity planning." If you determine that your client's network has adequate bandwidth now and into the future, you can plan and implement VoIP. If there isn't enough available network bandwidth (or you suspect a near-term bandwidth shortage), you'll need to recommend suitable network upgrades for the client before VoIP can be deployed.

Essentially, each channel can only handle so much of data per channel 64Kbps

references:
http://searchitchannel.techtarget.com/feature/Channel-Explained-Voice-over-Internet-Protocol-VoIP

What are various AMR Encoder modes?

MR122 : 12.2Kb/s
MR102 : 10.2Kb/s
MR795 : 7.95Kb/s
MR74  : 7.4Kb/s 
MR67  : 6.7Kb/s 
MR59  : 5.9Kb/s 
MR515 : 5.15Kb/s 

MR475 : 4.75Kb/s 

references:
https://books.google.co.in/books?id=767W1gWhj_oC&pg=PT363&lpg=PT363&dq=what+is+mr122+in+amr+encoder&source=bl&ots=sZEUJbMCW9&sig=_xVA4FxKQuEakBsTes_ZnxnIx0A&hl=en&sa=X&ved=0ahUKEwiK2IiG9qDTAhUFqo8KHRh8AFEQ6AEILjAC#v=onepage&q=what%20is%20mr122%20in%20amr%20encoder&f=false

Tuesday, April 11, 2017

Mac - iOS Universal Clipboard

This is a cool feature, if someone want to copy content from Mac and paste it anywhere. For this, both devices should be signed in to same Apple ID and Wifi, Bluetooth should be turned on,

Sign into iCloud with the same Apple ID on all your devices.
Make sure Bluetooth is turned on on all your devices.
Make sure Wi-Fi is turned on on all your devices.
Make sure your devices are near each other. Universal Clipboard is proximity-dependent.
Copy your text, photo, or video on one device.
Paste your text, photo, or video on your other device.




references:
http://www.imore.com/how-use-universal-clipboard-macos-sierra

Thursday, April 6, 2017

Accessibilty what is Decorative element

Decorative elements need not have any

references:
https://developer.apple.com/library/content/technotes/TestingAccessibilityOfiOSApps/TestAccessibilityonYourDevicewithVoiceOver/TestAccessibilityonYourDevicewithVoiceOver.html