Monday, June 19, 2017

iOS UIAutomation Brief overview


Provides a way to test the UI in automated way!. This approach relies on the accessibility labels. These tests are run using Automation instrument within Instruments tool. The tests can be run on simulator or on physical device! 

Thought to start with this and searching around the Automation framework, it appears that the Xcode 8.x has major changes in this. Now since the UIAutomation framework has come into place, Under Product -> Profile > Automation wont appear anymore. 

Now create a new project in Xcode, it will ask whether the test and UITest needs to be included or not. If opt to include, later when run through the Build > Test, it will run the automated UI Tests. 

references:

Wednesday, May 31, 2017

How does AMR Bandwidth efficient mode looks like?

 In the payload, no specific mode is requested (CMR=15), the speech frame is not damaged at the IP origin (Q=1), and the coding mode is AMR 7.4 kbps (FT=4).  The encoded speech bits, d(0) to d(147), are arranged in descending sensitivity order according to [2].  Finally, two padding bits (P) are added to the end as padding to make the payload octet aligned.




references:
https://tools.ietf.org/html/rfc4867

Friday, May 26, 2017

How to install tomcat on Mac ?

This can be installed easily through the home-brew. 

$brew install tomcat 

this will take care of the downloading, installation and configuration of Tomcat and manage its dependancies as well. 

Brew keeps packages (known as kegs) in the Cellar, where one can check the config and data files. it is located at 

$ls /usr/local/cellar 

The tomcat installation can be verified using the home-brew’s service utility 

$brew services list 

To run the tomcat server, just execute the catalina command 

$ls /usr/local/Cellar/tomcat 
$ /usr/local/Cellar/tomcat /8.5.3/bin/catalina run 

after running the page can be visited at http://localhost:8080

references

Calling a method on constant C++

Why does the error below happen? 

member function 'SetMethodX’ not viable: 'this' argument has type 'const webrtc::ClassY’, but function is not marked const
In my case, the ClassY was constant and using that it was trying to call SetMethodX which was defined to be not constant.

To be able to call a function on a const object, you need to promise the compiler that the function will not modify the object. To do that, you mark the function with the keyword const after its argument list. For example, to make getDimension a const member function, you would change it to:

const ULL getDimension() const { return dimension; }
(Note that the const in the return type will have absolutely no effect, so you should get rid of it)

references:

C++ what is a non static member function?

it is a function without const or friend specifier and that is declared in a member specification. 
Below gives some good overview of various types of member functions 

class S 
{
int mf1(); //non static member function declaration 
void mf2() volatile, mf3() &&; //can be cv-qualified and reference-qualified
        int mf4() const {return data;} //can be defined inline
virtual void()  mf5() final; //can be virtual can use virtual/final 
        S() : data(12) {} // constructors are member functions as well. 
        int data;
};

int S::mf1() {return 6;} //if not defined inline, it has to be defined at namespace. 

references:

Wednesday, May 24, 2017

How to access Google Team drives from Desktop offline

This can be done basically using the google team drive. the new version has EAP Early access program which will allow one utilise the new features such as Team drive.


references:
https://gsuite.google.com/campaigns/index__drive-fs-eap.html

Differences between VoIP and VoLTE

They both essentially uses the same mechanism. Voice packets are encoded to data and sent as data packets over IP network. However, they both differs in the way that the data is carried and the network infrastructure used for carrying it. 

The differences between two include 

1. Using 3G or 4G network 
2. QoS component 
3. Radio Frequency or SR-VCC (Single Radio Voice Continuity) IMS requirement
4. HD Voice requirements

VoLTE packets have 3 times more voice and data carrying capacity using 4G network. VoLTE also has built in QoS components. 

4G boasts about having 100 megabits per second speed. VoIP can use either 3G or 4G while VoLTE can only use 4G. 

one of the main difference is in the QoS. VoLTE uses IMS network and separate radio frequency to help maintain the quality of the VoLTE transmission. something like an expressway! 

When the user leave an LTE coverage area, SR-VCC functions by connecting to the LTE and legacy network at the same time . The radio frequency uses the IMS framework to make this happen. 

Since VoLTE uses fewer resources and has simpler compacting ability for the data packets, networks and handsets, these items consume use less energy with VoLTE. This provides greater battery life for each. VoLTE calls take less time to connect than a 3G call. 

VoLTE calls take less time to connect than a 3G call.

references:

Monday, May 22, 2017

C++ vTable and vPtr

What is a virtual table?

Virtual Table is a lookup table of function pointers used to dynamically bind the virtual functions to objects at run time. It is not intended to be used directly by the program, and as such there is no standardised way to access it.

Every class that uses virtual functions (or is derived from a class that uses virtual functions) is given it's own virtual table as a secret data member. This table is set up by the compiler at compile time. A virtual table contains one entry as a function pointer for each virtual function that can be called by objects of the class. It stores NULL pointer to pure virtual functions in ABC. Virtual Table is created even for classes that have virtual base classes. In this case, the vTable has pointer to the shared instance of the base class along with the pointers to the class's virtual functions if any.

What is a vptr?

This vtable pointer or _vptr, is a hidden pointer added by the compiler to the base class and this pointer is pointing to the virtual table of that particular class. This _vptr is inherited to all the derived classes. Each object of a class with virtual functions transparently stores this _vptr. Call to a virtual function by an object is resolved by following this hidden _vptr.

references
https://www.quora.com/What-are-vTable-and-VPTR-in-C++

What is vTable and vPtr in C++


Tuesday, May 16, 2017

iOS Swift Make a CustomView using Xib file - Swift learning

These are the steps to making a custom UIView that is designable within Xcode:

1. Create a View .xib file (e.g.: CustomView.xib)
2. Design the user interface in Xcode
3. Set up Auto Layout constraints
4. Create a Swift code file (CustomView.swift)
5. Set .xib file’s “File’s Owner” custom class to CustomView (it must match the class name)
6. Implement both CustomView initializers: init(coder:) and init(frame:)
7. Load the UIView from the .xib file using the NSBundle and UINib classes
8. Add the view as a subview and property for the class (future modifications)
9. Add autoresizing masks for the view to match the size of the CustomView itself
10. Make the view’s frame leverage the design time size of the embedded CustomView’s bounds

The extremely important thing to remember is step 5

5. Set .xib file’s “File’s Owner” custom class to CustomView (it must match the class name)

After clicking on File's owner, before giving the class name as CustomView, do not touch anywhere else than the textfield. If touch anywhere in the View, it is going to set the class name to the UIView that is in the Xib file and the loading of the view will get into recursive loop.

references:


Monday, May 15, 2017

Capture and playback AMR packets - wireshark

This is an extremely useful tool found came in handy while integration AMR codec into a system.
To give a basic usage of this, Application sends the encoded frames to the other end and in the middle, AMR frames can be captured and played back. Below is the basic steps as its s described in the git page.

Filter packets of interest (e.g. by 'rtp')
Telephony -> RTP -> Stream analysis
Save -> Forward/Reverse stream audio, Save as type (*.raw)
Guess information about AMR format: AMR vs AMR-WB, octet-align vs bandwidth efficient, number of channels (e.g. from SIP signalization, RTP layer) and provide them to amr.py.

python amr.py -a -n 1 -v -w wb_amr_3.raw

the above command decodes the stream from the wireshark row file and output as wav file. If the file is playing correctly, then done!

amr.py -a -n 1 -v wb_amr_3.raw

references:
https://github.com/suma12/amr

What is EFR in AMR?

Enhanced Full Rate or EFR or GSM-EFR or GSM 06.60 is a speech coding standard that was developed in order to improve the quite poor quality of GSM-Full Rate (FR) codec. Working at 12.2 kbit/s the EFR provides wirelike quality in any noise free and background noise conditions. The EFR 12.2 kbit/s speech coding standard is compatible with the highest AMR mode (both are ACELP). Although the Enhanced Full Rate helps to improve call quality, this codec has higher computational complexity, which in a mobile device can potentially result in an increase in energy consumption as high as 5%[citation needed] compared to 'old' FR codec.

The sampling rate is 8000 sample/s leading to a bit rate for the encoded bit stream of 12.2 kbit/s. The coding scheme is the so-called Algebraic Code Excited Linear Prediction Coder (ACELP). The encoder is fed with data consisting of samples with a resolution of 13 bits left justified in a 16-bit word. The three least significant bits are set to 0. The decoder outputs data in the same format.[2]

The Enhanced Full Rate (GSM 06.60) technical specification describes the detailed mapping between input blocks of 160 speech samples in 13-bit uniform PCM format to encoded blocks of 244 bits and from encoded blocks of 244 bits to output blocks of 160 reconstructed speech samples. It also specifies the conversion between A-law or μ-law (PCS 1900) 8-bit PCM and 13-bit uniform PCM. This part of specification also describes the codec down to the bit level, thus enabling the verification of compliance to the part to a high degree of confidence by use of a set of digital test sequences. These test sequences are described in GSM 06.54 and are available on disks.

references
https://en.wikipedia.org/wiki/Enhanced_full_rate

Wednesday, May 10, 2017

Git how to commit a file that is added into commit?

If this is your last commit and you want to completely delete the file from your local and the remote repository, you can:
remove the file git rm
commit with amend flag: git commit --amend.


references

Tuesday, May 9, 2017

What is Conda ?

Conda is a package manager application that quickly installs, runs, and updates packages and their dependencies. The conda command is the primary interface for managing installations of various packages. It can query and search the package index and current installation, create new environments, and install and update packages into existing conda environments. See our Using conda section for more information.

Conda is also an environment manager application. A conda environment is a directory that contains a specific collection of conda packages that you have installed. For example, you may have one environment with NumPy 1.7 and its dependencies, and another environment with NumPy 1.6 for legacy testing. If you change one environment, your other environments are not affected. You can easily activate or deactivate (switch between) these environments. You can also share your environment with someone by giving them a copy of your environment.yaml file.

SEE ALSO: Managing environments.

A conda package is a compressed tarball file that contains system-level libraries, Python or other modules, executable programs, or other components. Conda keeps track of the dependencies between packages and platforms.

Conda packages are downloaded from remote channels, which are simply URLs to directories containing conda packages. The conda command searches a default set of channels, and packages are automatically downloaded and updated from http://repo.continuum.io/pkgs/.

SEE ALSO: Managing packages.

Users may modify what remote channels are automatically searched, for example, if they wish to maintain a private or internal channel (see Configuration for details).

The conda build command creates new packages that can be optionally uploaded to a repository such as PyPi, GitHub, or Anaconda.org.



references:
https://conda.io/docs/intro.html

Monday, May 8, 2017

Android how to simulate Service being killed



http://stackoverflow.com/questions/11365301/how-to-simulate-android-killing-my-process

Friday, May 5, 2017

Wireshark - How to decrypt SSL streams

The SSL dissector is fully functional and even supports advanced features such as decryption of SSL if the encryption key can be provided and Wireshark is compiled against GnuTLS (rather than OpenSSL or bsafe). This works for RSA private keys.

If Wireshark is compiled with SSL decryption support there will be a new option in the preferences for SSL. If the key entry option is absent then verify if your Wireshark is linked against the required GnuTLS library. This can be done with  wireshark -v . The output should include GnuTLS and GCrypt. If you see without GnuTLS, without Gcrypt, then you will need reconfigure with --with-gnutls, recompile and reinstall.

To configure a RSA private key, go to the SSL dissector preference in the Protocols tree. Then press the RSA keys list button. A new dialog windows appears showing you the currently configured RSA private keys. Press the New button to configure a new RSA private key. In the new window you have to configure the following fields:

The RSA key file can either be a PEM format private key or a PKCS#12 keystore. If the file is a PKCS#12 keystore (typically a file with a .pfx or .p12 extension), the password for the keystore must be specified in the Password field.

Starting with Wireshark 2.0, the RSA key file is automatically matched against the public key as found in the Certificate handshake message. Before Wireshark 2.0, it relied on the user to enter a valid Address and Port value. Note that only RSA key exchanges can be decrypted using this RSA private key, Diffie-Hellman key exchanges cannot be decrypted using a RSA key file! (See "SSLKEYLOGFILE" if you have such a capture.)

The fileformat needed is 'PEM'. Note that it is common practice on webservers to combine the public key (or certificate) and the private key in a single PEM file.

references:
https://ask.wireshark.org/questions/34393/how-to-decrypt-ssl-traffic-using-wireshark

Wednesday, May 3, 2017

WAV Storage format

The data bits for each sample should be left-justified and padded with 0s. For example, consider the case of a 10-bit sample (as samples must be multiples of 8, we need to represent it as 16 bits). The 10 bits should be left-justified so that they become bits 6 to 15 inclusive, and bits 0 to 5 should be set to zero.



As an example, here is a 10-bit sample with a value of 0100001111 left-justified as a 16-bit word.



Given the fact that the WAVE format uses Intel's little endian byte order, the LSB is stored first, as shown here:

The analogy is for mono audio, meaning that you have just one "channel." When you deal with stereo audio, 3D audio, and so forth, you are in effect dealing with multiple channels, meaning you have multiple samples describing the audio in any given moment in time. For example, for stereo audio, at any given point in time you need to know what the audio signal was for the left channel as well as the right channel. So, you will have to read and write two samples at a time.


Say you sample at 44 KHz for stereo audio; then effectively, you will have 44 K * 2 samples. If you are using 16 bits per sample, then given the duration of audio, you can calculate the total size of the wave file as:

Size in bytes = sampling rate * number of channels * (bits per sample / 8) * duration in seconds

Number of samples per second = sampling rate * number of channels

When you are dealing with such multi-channel sounds, single sample points from each channel are interleaved. Instead of storing all of the sample points for the left channel first, and then storing all of the sample points for the right channel next, you "interleave" the two channels' samples together. You would store the first sample of the left channel. Then, you would store the first sample of the right channel, and so on.

references:
http://www.codeguru.com/cpp/g-m/multimedia/audio/article.php/c8935/PCM-Audio-and-Wave-Files.htm

What is PCM ?


In the digital domain, PCM (Pulse Code Modulation) is the most straightforward mechanism to store audio. The analog audio is sampled in accordance with the Nyquest theorem and the individual samples are stored sequentially in binary format. The wave file is the most common format for storing PCM data.

Interchange Format Files (IFF)
It is a "Meta" file format developed by a company named Electronic Arts. The full name of this format is ElectronicArts Interchange File Format 1985 (EA IFF 85). IFF lays down a top-level protocol on what the structure of IFF compliant files should look like. It targets issues such as versioning, compatibility, portability, and so forth. It helps specify standardized file formats that aren't tied to a particular product. The wave file format is based on the generic IFF format.

The WAVE File Format supports a variety of bit resolutions, sample rates, and channels of audio.

The WAVE file format is based on Microsoft's version of the Electronic Arts Interchange File Format method for storing data. In keeping with the dictums of IFF, data in a Wave file is stored in many different "chunks." So, if a vendor wants to store additional information in a Wave file, he just adds info to new chunks instead of trying to tweak the base file format or come up with his own proprietary file format.


There are three chunks that are required to be present in a valid wave file:

'RIFF', 'WAVE' chunk
"fmt" chunk
'data' chunk

All other chunks are optional. The Riff wave chunk is the identifier chunk that tells us that this is a wave file. The "fmt" chunk contains important parameters describing the waveform, such as its sample rate, bits per sample, and so forth. The Data chunk contains the actual waveform data.

An application that uses a WAVE file must be able to read the three required chunks although it can ignore the optional chunks. But, all applications that perform a copy operation on wave files should copy all of the chunks in the WAVE.

The Riff chunk is always the first chunk. The fmt chunk should be present before the data chunk. Apart from this, there are no restrictions upon the order of the chunks within a WAVE file.

While interpreting WAVE files, the unit of measurement used is a "sample." Literally, it is what it says. A sample represents data captured during a single sampling cycle. So, if you are sampling at 44 KHz, you will have 44 K samples. Each sample could be represented as 8 bits, 16 bits, 24 bits, or 32 bits. (There is no restriction on how many bits you use for a sample except that it has to be a multiple of 8.) To some extent, the more the number of bits in a sample, the better the quality of the audio.

One detail to note is that 8-bit samples are represented as "unsigned" values whereas 16-bit and higher are represented by "signed" values.

references:
http://www.codeguru.com/cpp/g-m/multimedia/audio/article.php/c8935/PCM-Audio-and-Wave-Files.htm

Tuesday, May 2, 2017

WebRTC Basics of FineAudioBuffer

FineAudioBuffer takes an AudioDeviceBuffer (ADB) which deals with audio data corresponding to 10ms of data. It then allows for this data to be pulled in a finer or coarser granularity. I.e. interacting with this class instead of directly with the AudioDeviceBuffer one can ask for any number of audio data samples. This class also ensures that audio data can be delivered to the ADB in 10ms chunks when the size of the provided audio buffers differs from 10ms. As an example: calling DeliverRecordedData() with 5ms buffers will deliver accumulated 10ms worth of data to the ADB every second call.

Constructor is like below.

FineAudioBuffer(AudioDeviceBuffer* device_buffer,
                  size_t desired_frame_size_bytes,
                  int sample_rate);

device_buffer => Is the buffer that provides 10 ms of audio data.
desired_frame_size_bytes => is the number of bytes of audio data GetPlayoutData() should return on success. It is also the required each recorded buffer used in DeliverRecordedData calls
sample_rate => is the sample rate of audio data. This is needed because |device_buffer| delivers 10ms of data. Given the sample rate the number of samples can be calculated.

The two main function in this class are the ones below

1. void FineAudioBuffer::GetPlayoutData(int8_t* buffer)
This method asks webrtc for data in 10 milliseconds

2. void FineAudioBuffer::DeliverRecordedData(const int8_t* buffer,
                                          size_t size_in_bytes,
                                          int playout_delay_ms,
                                          int record_delay_ms)
Deliver the recorded data in in 10ms samples to the observer. Consume samples from buffer in chunks of 10ms until there is not enough data left. The number of remaining bytes in the cache is given by the new size of the buffer.

references:
https://chromium.googlesource.com/external/webrtc/+/master/webrtc/modules/audio_device/fine_audio_buffer.cc

Monday, May 1, 2017

Assertions in C - Pre, Post, Invariants.

In C, assertions are implemented with the standard assert macro. The argument to assert must be true when the macro is executed, otherwise the program aborts and prints an error message. For example, the assertion

    assert( size <= LIMIT );
will abort the program and print an error message like this:

    Assertion violation: file tripe.c, line 34: size <= LIMIT
if size is greater than LIMIT.

There are 3 types of assertions

1. Preconditions : Specify conditions at the start of a function.
2. Postconditions : Specify conditions at the end of a function.
3. Invariants : Specify conditions over a defined region of a program.

An assertion violation indicates a bug in the program. Thus, assertions are an effective means of improving the reliability of programs-in other words, they are a systematic debugging tool.

Preconditions

Preconditions specify the input conditions to a function. Here is an example of a function with preconditions:

int
magic( int size, char *format )
{
    int maximum;

    assert( size <= LIMIT );
    assert( format != NULL );
    ...
These pre-conditions have two consequences:

magic is only required to perform its task if the pre- conditions are satisfied. Thus, as the writer of magic, you are not required to make magic do anything sensible if size or format are not as stated in the assertions.
The caller is certain of the conditions under which magic will perform its task correctly. Thus, if your code is calling magic, you must ensure that the size or format arguments to the call are as specified by the assertions.


Postconditions

Postconditions specify the output conditions of a function. They are used much less frequently than preconditions, partly because implementing them in C can be a little awkward. Here is an example of a postcondition in magic:

    ...
    assert( result <= LIMIT );
    return result;
}
The postcondition also has two consequences:

magic guarantees that the stated condition will hold when it completes execution. As the writer of magic, you must make certain that your code never produces a value of result that is greater than LIMIT.
The caller is certain of the task that magic will perform (provided its preconditions are satisfied). If your program is calling magic, then you know that the result returned by magic can be no greater than LIMIT.
Compare this with the apple-picker analogy. Another part of your contract states that you will not bruise the apples. It is therefore your responsibility to ensure that you do not (and if you do, you have failed.) Your employer is thus relieved of the need to check that the apples are not bruised before shipping them.

Assert Writing preconditions

The simplest and most effective use of assertions is as preconditions-that is, to specify and check input conditions to functions. Two very common uses are to assert that:

Pointers are not NULL.
Indexes and size values are non-negative and less than a known limit.
Each assertion must be listed in the Asserts section of the function description comment in the corresponding header file. For example, the comment describing magic will include:

 *  Asserts:
 *      'size' is no greater then LIMIT.
 *      'format' is not NULL.
 *      The function result is no greater than LIMIT.
 */
If there are no assertions, write ``Nothing'':

 *  Asserts:
 *      Nothing
 */

Assertion violations

If a precondition is violated during program testing and debugging, then there is a bug in the code that called the function containing the precondition. The bug must be found and fixed.

If a postcondition is violated during program testing and debugging, then there is a bug in the function containing the precondition. The bug must be found and fixed.


Turning assertions off

By default, ANSI C compilers generate code to check assertions at run-time. Assertion-checking can be turned off by defining the NDEBUG flag to your compiler, either by inserting

    #define NDEBUG
in a header file such as stdhdr.h, or by calling your compiler with the -dNDEBUG option:

    cc -dNDEBUG ...
This should be done only you are confident that your program is operating correctly, and only if program run-time is a pressing concern.

If you face any error like this while compiling the code, make sure you have #include included.


error: implicit declaration of function 'assert' is invalid in C99 [-Werror,-Wimplicit-function-declaration]

references:
http://ptolemy.eecs.berkeley.edu/~johnr/tutorials/assertions.html

How to convert WAV to PCM format?

Wav files are often just raw PCM data with some RIFF and Wav header  information, depending on whether or not you know exactly whats in there will determine the method you use, stereo pcm data in cd quality sound is interleaved 16 bits per channel left first, often with wav files you can discard the first 44 bytes of the header and write the rest of the pcm data to your card to play or convert, but wav files can contain almost any kind of sound data, mu-law, pcm, adpcm, even mp3 wrapped in a wave file header. Another portable library is libao if you only want to open the file
for playing, very easy to use. but libsndfile does pretty much the same thing, also openal will give you a threaded write to your sound  device if you making a game of some kind.

references:
http://computer-programming-forum.com/47-c-language/7c98914c3e47bd5b.htm

Friday, April 28, 2017

Testing Opencore decoder

Once compiled, the test executable is in the folder /test/amrnb-dec
the program input is amr file and the output is wav file.


./amrnb-dec outamr8khz.amr outamr8khz.wav

Looking at the code, after decoding the stream, it needs to be converted to little endian for
wav format.

while (1) {
uint8_t buffer[500], littleendian[320], *ptr;
int size, i;
int16_t outbuffer[160];
/* Read the mode byte */
n = fread(buffer, 1, 1, in);
if (n <= 0)
break;
/* Find the packet size */
size = sizes[(buffer[0] >> 3) & 0x0f];
n = fread(buffer + 1, 1, size, in);
if (n != size)
break;

/* Decode the packet */
Decoder_Interface_Decode(amr, buffer, outbuffer, 0);

/* Convert to little endian and write to wav */
ptr = littleendian;
for (i = 0; i < 160; i++) {
*ptr++ = (outbuffer[i] >> 0) & 0xff;
*ptr++ = (outbuffer[i] >> 8) & 0xff;
}
wav_write_data(wav, littleendian, 320);
}

Opencore AMR - How to build and test and Verify encode and decode functionality

the build system is based on GNU. on the downloaded opencore-amr folder run the following two

./configure
make

output of ./configure
My-MacBook-Pro:opencore-amr retheeshravi$ ./configure 
checking for a BSD-compatible install... /usr/local/bin/ginstall -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/local/bin/gmkdir -p
checking for gawk... no
checking for mawk... no
checking for nawk... no
checking for awk... awk
checking whether make sets $(MAKE)... yes
checking how to create a ustar tar archive... gnutar
checking whether to enable maintainer-specific portions of Makefiles... no
checking build system type... i386-apple-darwin15.5.0
checking host system type... i386-apple-darwin15.5.0
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking for style of include used by make... GNU
checking dependency style of g++... gcc3
checking for gcc... gcc
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking dependency style of gcc... gcc3
checking whether ln -s works... yes

checking whether make sets $(MAKE)... (cached) yes


Output of make
Make will make the executables based on the output from configure script. 
my-MacBook-Pro:opencore-amr retheeshravi$ make
/Users/mymacbook/Downloads/Xcode.app/Contents/Developer/usr/bin/make  all-recursive
Making all in amrnb
make[2]: Nothing to be done for `all'.
Making all in amrwb
make[2]: Nothing to be done for `all'.
Making all in test
  CC     amrwb-dec.o
  CC     wavwriter.o
  CCLD   amrwb-dec
  CC     amrnb-dec.o
  CCLD   amrnb-dec
  CC     amrnb-enc.o
  CC     wavreader.o
  CCLD   amrnb-enc
  CC     linkboth.o
  CCLD   linkboth
  CC     amrnb-enc-sine.o
  CCLD   amrnb-enc-sine

Now we can run the amrnb-dec, amrnb-enc programs independantly and get the results. 

references:

Thursday, April 27, 2017

C Search and Replace

A sample C code to search and replace string in a text

void str_replace(char *target, const char *needle, const char *replacement)
{
    char buffer[1024] = { 0 };
    char *insert_point = &buffer[0];
    const char *tmp = target;
    size_t needle_len = strlen(needle);
    size_t repl_len = strlen(replacement);
    
    while (1) {
        const char *p = strstr(tmp, needle);
        
        // walked past last occurrence of needle; copy remaining part
        if (p == NULL) {
            strcpy(insert_point, tmp);
            break;
        }
        
        // copy part before needle
        memcpy(insert_point, tmp, p - tmp);
        insert_point += p - tmp;
        
        // copy replacement string
        memcpy(insert_point, replacement, repl_len);
        insert_point += repl_len;
        
        // adjust pointers, move on
        tmp = p + needle_len;
    }
    
    // write altered string back to target
    strcpy(target, buffer);

}

C++ how to get the absolute file path from iOS Cache directory

With code below I able to access cache folder in my app. I think, documents folder in "/Library/Documents", or somewhere else.

char *home = getenv("HOME");
char *subdir = "/Library/Caches/subdir";

char fullpath[200];
strcpy(fullpath,home);

strcat(fullpath, subdir);

Next, with full path, you can do usual things to read/write in C++

references:
http://stackoverflow.com/questions/13469342/using-c-to-access-documents-folder-on-ios

A collection of RTP tools

There are a few useful tools here.

rtpplay
tpplay reads RTP session data, recorded by rtpdump -F dump from either the file or stdin, if file is not specified, sending it to network address destination and port port with a time-to-live value of ttl.

If the flag -T is given, the timing between packets corresponds to the arrival timing rather than the RTP timestamps. Otherwise, for RTP data packets, the timing given by the RTP timestamps is used, smoothing interarrival jitter and restoring packet sequence. RTCP packets are still sent with their original timing. This may cause the relative order of RTP and RTCP packets to be changed.

The source port(localport) for outgoing packets can be set with the -s flag. A random port is chosen if this flag is not specified.

The whole file is played unless the begin or end times are specified. Times are measured in seconds and fractions from the beginning of the recording.

The RTP clock frequency is read from the profile file if given; the default profile (RFC 1890) is used if not. The profile file contains lines with two fields each: the first is the numeric payload type, the second the clock frequency. The values read from the profile file are silently ignored if the -T flag is used.

If you want to loop a particular file, it is easiest to put the rtpplay command in a shell script.

The -v flag has rtpplay display the packets generated on stdout.

rtpplay uses the hsearch (3C) library, which may not be available on all operating systems.

rtpdump

rtpdump [-F format] [-t duration] [-x bytes] [-f file] [-o outputfile] address/port
rtpdump listens on the address and port pair for RTP and RTCP packets and dumps a processed version to outputfile if specified or stdout otherwise.

If file is specified, the file is used instead of the network address. If no network address is given, file input is expected from stdin. The file must have been recorded using the rtpdump dump format.

The recording duration is measured in minutes.

From each packet, only the first bytes of the payload are dumped (only applicable for "dump" and "hex" formats).

rtpsend

rtpsend sends an RTP packet stream with configurable parameters. This is intended to test RTP features. The RTP or RTCP headers are read from a file, generated by hand, a test program or rtpdump (format "ascii").

rtpsend [-a] [-l] [-s sourceport] [-f file] destination/port[/ttl]

Packets are sent with a time-to-live value ttl.

If data is read from a file instead of stdin, the -l(loop) flag resends the same sequence of packets again and again.

The source port(localport) for outgoing packets can be set with the -s flag. A random port is chosen if this flag is not specified.

If the -a flag is specified, rtpsend includes a router alert IP option in RTCP packets. This is used by the YESSIR resource reservation protoccol.

The file file contains the description of the packets to be sent. Within the file, each entry starts with a time value, in seconds, relative to the beginning of the trace. The time value must appear at the beginning of a line, without white space. Within an RTP or RTCP packet description, parameters may appear in any order, without white space around the equal sign. Lines are continued with initial white space on the next line. Comment lines start with #. Strings are enclosed in quotation marks.

rtptrans

tptrans RTP/RTCP packets arriving from one of the addresses to all other addresses. Addresses can be a multicast or unicast. TTL values for unicast addresses are ignored. (Actually, doesn't check whether packets are RTP or not.)

Additionally, the translator can translate VAT packets into RTP packets. VAT control packets are translated into RTCP SDES packets with a CNAME and a NAME entry. However, this is only intended to be used in the following configuration: VAT packets arriving on a multicast connection are translated into RTP and sent over a unicast link. RTP packets are not (yet) translated into VAT packets and and all packets arriving on unicast links are not changed at all. Therefore, currently mainly the following topology is supported: multicast VAT -> translator -> unicast RTP; and on the way back it should lokk like this multicast VAT <- able="" agent="" and="" audio="" be="" both="" link="" means="" on="" p="" rtp.="" should="" that="" the="" this="" translator="" unicast="" use="" vat.="" vat="">
references:
http://www.cs.columbia.edu/irt/software/rtptools/

Tuesday, April 25, 2017

C commands to create Object file, executables

Below are some basic steps

 g++ -c my_lib.cpp => gives output as my_lib.o
 gcc -c my_lib.c => gives output as  my_lib.o

to link my_source.c with object my_lib.o file, below to be used.
gcc my_source.c my_lib.o

gcc 

Saturday, April 22, 2017

Extern Keyword in C

extern allows one module of your program to access a global variable or function declared in another module of your program. You usually have extern variables declared in header files. extern is used to let other C files or external components know this variable is already defined somewhere.

Using extern is only of relevance when the program you're building consists of multiple source files linked together, where some of the variables defined, for example, in source file file1.c need to be referenced in other source files, such as file2.c.

It is important to understand the difference between defining a variable and declaring a variable:

1) A variable is defined when the compiler allocates the storage for the variable.

2) A variable is declared when the compiler is informed that a variable exists (and this is its type); it does not allocate the storage for the variable at that point.

You may declare a variable multiple times (though once is sufficient); you may only define it once within a given scope.

Best way to declare and define global variables

Although there are other ways of doing it, the clean, reliable way to declare and define global variables is to use a header file file3.h to contain an extern declaration of the variable. The header is included by the one source file that defines the variable and by all the source files that reference the variable. For each program, one source file (and only one source file) defines the variable. Similarly, one header file (and only one header file) should declare the variable.

file3.h
extern int global_variable;  /* Declaration of the variable */

file1.c
#include "file3.h"  /* Declaration made available here */
#include "prog1.h"  /* Function declarations */

/* Variable defined here */
int global_variable = 37;    /* Definition checked against declaration */
int increment(void) { return global_variable++; }

file2.c
#include "file3.h"
#include "prog1.h"
#include

void use_it(void)
{
    printf("Global variable: %d\n", global_variable++);
}

prog1.h
extern void use_it(void);
extern int increment(void);

prog1.c
#include "file3.h"
#include "prog1.h"
#include

int main(void)
{
    use_it();
    global_variable += 19;
    use_it();
    printf("Increment: %d\n", increment());
    return 0;
}

references:
http://stackoverflow.com/questions/1433204/how-do-i-use-extern-to-share-variables-between-source-files-in-c

Wednesday, April 19, 2017

Why did not wireshark show up the RTP packets instead shown as UDP?

This was when doing the AMR codec tests. Thought since AMR is licenced codec, RTP also not shown as RTP packets, however, turned out to be the configuration which is in screenshot below.

Analyze > Enabled Protocols

What is C++11 ?

C++11 is a version of the standard for the programming language C++. It was approved by International Organization for Standardization (ISO) on 12 August 2011, replacing C++03, superseded by C++14 on 18 August 2014 and later, by C++17, which is still under development.

After the approval of the C++ standard in 1998, two committee members prophesied that the next C++ standard would “certainly” include a built-in garbage collector (GC), and that it probably wouldn’t support multithreading because of the technical complexities involved in defining a portable threading model. Thirteen years later, the new C++ standard, C++11, is almost complete. Guess what? It lacks a GC but it does include a state-of–the-art threading library.


references:
http://blog.smartbear.com/c-plus-plus/the-biggest-changes-in-c11-and-why-you-should-care/

Tuesday, April 18, 2017

Simple payload sorting

If multiple new frames are encapsulated into the payload and robust payload sorting is not used, the payload is formed by concatenating the payload header, the ToC, optional CRC fields and the speech frames in the payload. However, the bits inside a frame are ordered into sensitivity order as defined in [2] for AMR and [4] for AMR-WB.

Simple payload sorting for bandwidth efficient operation

Simple payload sorting algorithm sorts Payload header, table of contents, payload frames and does padding and map into octets

/* payload header */
   k=0; H=4;
   for (i = 0; i < H; i++){
     b(k++) = h(i);
   }
   /* table of contents */
   T=6;
   for (j = 0; j < N; j++){
     for (i = 0; i < T; i++){
       b(k++) = t(j,i);
     }
   }
   /* payload frames */
   for (j = 0; j < N; j++){
     for (i = 0; i < F(j); i++){
       b(k++) = f(j,i);
     }
   }
   /* padding */
   S = (k%8 == 0) ? 0 : 8 - k%8;
   for (i = 0; i < S; i++){
     b(k++) = 0;
   }
   /* map into octets */
   for (i = 0; i < k; i++){
     o(i/8,i%8)=b(i)
   }

Simple payload sorting for octet aligned operation
/* payload header */
   k=0; H=8;
   if (interleaving){
     H+=8;       /* Interleaving extension */
   }
   for (i = 0; i < H; i++){
     b(k++) = h(i);
   }

 /* table of contents */
   T=8;
   for (j = 0; j < N; j++){
     for (i = 0; i < T; i++){
       b(k++) = t(j,i);
     }
   }

   /* CRCs, only if signaled */
   if (crc) {
     for (j = 0; j < N; j++){
       for (i = 0; i < C(j); i++){
         b(k++) = p(j,i);
       }
     }
   }
/* payload frames */
   for (j = 0; j < N; j++){
     for (i = 0; i < F(j); i++){
       b(k++) = f(j,i);
     }
     /* padding of each speech frame */
     S = (k%8 == 0) ? 0 : 8 - k%8;
     for (i = 0; i < S; i++){
       b(k++) = 0;
     }
   }
   /* map into octets */
   for (i = 0; i < k; i++){
     o(i/8,i%8)=b(i)
   }

references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

AMR Compound Payload

The compound payload consists of one payload header, the table of contents and one or more speech frames

These elements SHALL be put together to form a payload with either simple or robust sorting. If the bandwidth efficient operation is used, simple sorting MUST be used.

Definitions for describing the compound payload:

b(m)    - bit m of the compound payload, octet aligned
o(n,m)  - bit m of octet n in the octet description of the compound
             payload, bit 0 is MSB
t(n,m)  - bit m in the table of contents entry for speech frame n
p(n,m)  - bit m in the CRC for speech frame n
f(n,m)  - bit m in speech frame n
F(n)    - number of bits in speech frame n, defined by FT
h(m)    - bit m of payload header
C(n)    - number of CRC bits for speech frame n, 0 or 8 bits
P(n)    - number of padding bits for speech frame n
N       - number of payload frames in the payload
S       - number of unused bits

Payload frames f(n,m) are ordered in consecutive order, where frame n is preceding frame n+1. Within one payload with multiple speech frames the sequence of speech frames MUST contain all speech frames in the sequence. If interleaving is used  the interleaving rules defined in section 2.2 applies for which frames that are contained in the payload. If speech data is missing for one or more frames in the sequence of frames in the payload, due to e.g. DTX, send the NO_DATA frame type in the ToC for these frames. This does not mean that all frames must be sent, only that the sequence of frames in one payload MUST indicate missing frames. Payloads containing only NO_DATA frames SHOULD NOT be transmitted.

The compound payload, b, is mapped into octets, o, where bit 0 is MSB.

references
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

Speech frame of AMR

A speech frame represents one frame encoded with the mode according to the ToC field FT. The length of this field is implicitly defined by the mode in the FT field. The bits SHALL be sorted according to Annex B of [2] for AMR and Annex B of [4] for AMR-WB.

If octet aligned operation is used, the last octet of each speech frame MUST be padded with zeroes at the end if not all bits are used.

references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

The payload table of contents and CRCs

The table of contents (ToC) consists of one entry for each speech frame in the payload. A table of contents entry includes several specified fields as follows:

F (1 bit): Indicates if this frame is followed by further speech frames in this payload or not. F=1 further frames follow, F=0 last frame.

FT (4 bits): Frame type indicator, indicating the AMR or AMR-WB speech coding mode or comfort noise (SID) mode.If FT=14 (speech lost, available only in AMR-WB) or FT=15 (No transmission/no reception) no CRC or payload frame is present.

Q (1 bit): The payload quality bit indicates, if not set, that the payload is severely damaged and the receiver should set the RX_TYPE, see [6], to SPEECH_BAD or SID_BAD depending on the frame type (FT).

P: Is a padding bit, MUST be set to zero.


FIGURE 7 - Table of contents entry field for bandwidth efficient operation.



Figure 8 - An example of ToC when using bandwidth efficient operation.


Figure 9 - Table of contents entry field for octet aligned operation.

CRC (8 bits): OPTIONAL field, exists if the use of CRC is signaled at session set up and SHALL only be used in octet aligned operation. The 8 bit CRC is used for error detection. The algorithm to generate these 8 parity bits are defined in section 4.1.4 in [2].


Figure 10: CRC field

The ToC and CRCs are arranged with all table of contents entries fields first followed by all CRC fields. The ToC starts with the frame data belonging to the oldest speech frame.



Figure 11: The ToC and CRCs for a payload with three speech frames when using octet aligned operation.

references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt



RTP payload header usage for AMR

The payload header consists of a 4 bit codec mode request.If octet aligned operation is used the payload header is padded to fill an octet and optionally an 8 bit interleaving header may extend the payload header. The bits in the header are specified as follows:

CMR (4 bits): Indicates Codec Mode Requested for the other communication direction. It is only allowed to request one of the speech modes of the used codec, frame type index 0..7 for AMR, see Table 1a in [2] or frame type index 0..8 for AMR-WB, see Table 1a in [4]. CMR value 15 indicates that no mode request is present, other values are for future use. It is RECOMMENDED that the encoder follows a received mode request, but if the encoder has reason for not follow the mode request, e.g. congestion control, it MAY use another mode.  The codec mode request (CMR) MUST be set to 15 for packets sent to a multicast group. The encoder in the sender SHOULD ignore mode requests when sending to a multicast session but MAY use RTCP feedback information as a hint that a mode change is needed. The codec mode selection MAY be restricted by the mode set definition at session set up. If so, the selected codec mode MUST be in the signaled mode set.

R: Is a reserved bit that MUST be set to zero. All R bits MUST be ignored by the receiver.

If the use of interleaving is signaled out of band at session set up, octet aligned operation MUST be used. When interleaving is used the payload header is extended with two 4 bit fields, ILL and ILP, used  to describe the interleaving scheme.

ILL (4 bits): OPTIONAL field that is present only if interleaving is signaled. The value of this field specifies the interleaving length used for frames in this payload.

ILP (4 bits): OPTIONAL field that is present only if interleaving is signaled. The value of this field indicates the interleaving index for frames in this payload. The value of ILP MUST be smaller than or equal to the value of ILL. Erroneous value of ILP SHOULD cause the payload to be discarded.

The value of the ILL field defines the length of an interleave group: ILL=L implies that frames in (L+1)-frame intervals are picked into the same interleaved payload, and the interleave group consists of  L+1 payloads. The size of the interleaving group is the N*(L+1), if N is the number of frames per payload. The value of ILL MUST only be changed between interleave groups. The value of ILP=p in payloads  belonging to the same group runs from 0 to L. The interleaving is meaningful only when the number of frames per payload (N) is greater than or equal to 2. All payloads in an interleave group MUST contain equally many speech frames. When N frames are transmitted in each payload of a group, the interleave group consists of payloads with sequence numbers s...s+L, and frames encapsulated into these payloads are f...f+N*(L+1)-1.

To put this in a form of an equation, assume that the first frame of an interleave group is n, the first payload of the group is s, number of frames per payload is N, ILL=L and ILP=p (p in range 0...L), the frames contained by the payload s+p are n + p + k*(L+1), where k runs from 0 to N-1. I.e.

The first packet of an interleave group: ILL=L, ILP=0
   Payload: s
   Frames: n, n+(L+1), n+2*(L+1), ..., n+(N-1)*(L+1)

The second packet of an interleave group: ILL=L, ILP=1
   Payload: s+1
   Frames: n+1, n+1+(L+1), n+1+2*(L+1), ..., n+1+(N-1)*(L+1)


The last packet of an interleave group: ILL=L, ILP=L
    Payload: s+L
    Frames: n+L, n+L+(L+1), n+L+2*(L+1), ..., n+L+(N-1)*(L+1)


references
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

Monday, April 17, 2017

RTP header usage of AMR

The RTP header marker bit (M) is used to mark (M=1) the packages containing as their first frame the first speech frame after a comfort noise period in DTX operation. For all other packets the marker bit is set to zero (M=0).

The timestamp corresponds to the sampling instant of the first sample encoded for the first frame in the packet. A frame can be either encoded speech, comfort noise parameters, NO_DATA, or SPEECH_LOST (only for AMR-WB). The timestamp unit is in samples.

The duration of one speech frame is 20 ms and the sampling frequency is 8 kHz, corresponding to 160 encoded speech samples per frame for AMR and 16 kHz corresponding to 320 samples per frame in AMR-WB. Thus, the  timestamp is increased by 160 for AMR and 320 for AMR-WB for each consecutive frame. All frames in a packet MUST be successive 20 ms frames except if interleaving is employed, then frames encapsulated into a payload MUST be picked

The payload MAY be padded using P bit in the RTP header.

references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

AMR Payload format - Part 1

The AMR and AMR-WB payload format supports transmission of multiple frames per payload, the use of fast codec mode adaptation, and  robustness against packet loss and bit errors.

The payload format consists of one payload header with an optional interleaving extension, a table of contents, optionally one CRC per payload frame and zero or more payload frames.

The payload format is either bandwidth efficient or octet aligned, the mode of operation to use has to be signalled at session establishment. Only the octet aligned format has the possibility to use the robust sorting, interleaving and CRC to make it robust to packet loss and bit errors. In the octet aligned format the payload header, table of contents entries and the payload frames are individually octet aligned to make implementations efficient, but in the bandwidth efficient format only the full payload is octet aligned. If the option to transmit a robust sorted payload is  signalled the full payload SHALL finally be ordered in descending bit  error sensitivity order to be prepared for unequal error protection or unequal error detection schemes.

Robustness against packet loss can be accomplished by using the  possibility to retransmit previously transmitted frames together with the current frame or frames. This is done by using a sliding window to group the speech frames to send in each payload, see figure 1. A packet containing redundant frames will not look different from a packet with only new frames. The receiver may receive multiple copies or versions (encoded with different modes) of a frame for a certain timestamp if no packet losses are experienced. If multiple versions of a speech frame is received, it is RECOMMENDED that the mode with the highest rate is used by the speech decoder.

references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

Sunday, April 16, 2017

AMRNB and AMRWB codecs as described by IETF

The Adaptive Multi-Rate speech codec

The AMR codec is a multi-mode codec with 8 narrow band speech modes with bit rates between 4.75 and 12.2 kbps. The sampling frequency is 8000 Hz and processing is done on 20 ms frames, i.e. 160 samples per frame. The AMR modes are closely related to each other and use the same coding framework

Below three of the AMR modes are already adopted

PDC-EFR : 6.7Kbps
IS-641 : 6.7 Kbps
GSM-EFR : 12.2Kbps

The Adaptive Multi-Rate Wideband speech codec

The AMR-WB codec is a multi-mode speech codec with 9 wideband speech  coding modes with bit-rates between 6.6 and 23.85 kbps. The sampling frequency is 16000 Hz and processing is performed on 20 ms frames, i.e. 320 speech samples per frame. The AMR-WB modes are closely related to each other and employ the same coding framework.

Common Characteristics for AMR and AMR-WB

The multi-mode feature is used to preserve high speech quality under a wide range of transmission conditions. In mobile radio systems (e.g. GSM) mode adaptation allows the system to adapt the balance  between speech coding and error protection to enable best possible speech quality in prevailing transmission conditions. Mode adaptation  can also be utilized to adapt to the varying available transmission bandwidth. Every codec implementation MUST support all specified speech coding modes.  The codecs can handle mode switching to any mode at any time, but some transport systems have limitations in the  number of supported modes and on how often the mode can change. The  mode information must therefore be transmitted together with the speech encoded bits, to indicate the mode. To realize rate adaptation  the decoder needs to signal the mode it prefers to receive to the  encoder. It is RECOMMENDED that the encoder follows a received mode request, but if the encoder has reason for not follow the mode  request, e.g. congestion control, it may use another mode. No codec mode request MUST be sent for packets sent to a multicast group, and the encoder in the sender SHOULD ignore mode requests when sending to  a multicast session but MAY use RTCP feedback information as a hint that a mode change is needed.

Both codecs include voice activity detection (VAD) and generation of comfort noise (CN) parameters during silence periods. Hence, the  codecs have the option to reduce the number of transmitted bits and  packets during silence periods to a minimum. The operation to send CN parameters at regular intervals during silence periods is usually called discontinuous transmission (DTX) or source controlled rate (SCR) operation. The frames containing CN parameters are called Silence Indicator (SID) frames.

references:
https://www.ietf.org/proceedings/51/I-D/draft-ietf-avt-rtp-amr-10.txt

Codec Latency vs. Bandwidth Optimization

The low-bandwidth codecs are quite efficient. For example, G.729 will compress 10 milliseconds of audio to 10 bytes and G.723.1 encodes 30ms frames to 24 or 20 bytes.

However, since we send compressed audio frames as payload in RTP packets which are in turn sent over UDP, we need to consider the overhead for IP, UDP, and RTP headers. The overhead is 40 bytes per packet. This is significant when compared with the size of a compressed audio frame if we are not on a local area network and the bandwidth is limited. The table below shows the overhead for several low-bandwidth codecs.

If you want to improve bandwidth utilization, the obvious way to go is to send more frames in one RTP packet. However, as you do this, you also increase latency. If you decide to send, say, 100 milliseconds of audio in one packet. this means that you have added a latency of the same 100 milliseconds. Simply put, the first sample of the first frame arrives together with the last sample of the last frame in the packet, so the total delay is equal to the length of audio carried in the packet.

There is a recommendation that round-trip latency should not exceed approximately 300 milliseconds, otherwise people will start noticing.

When calculating the latency, you need to consider the time it takes to send a packet from one end to the another (your mileage may vary, try to use "traceroute" to get a clue) and the size of the jitter buffer of the receiving end (which can be 50-60 milliseconds worth of audio). Considering all this, I would say the reasonable maximum is to send 60 milliseconds of audio in one packet. This will result in the following bitrates:

In addition to latency, there are two more things you should consider when increasing the number of audio frames per RTP packet:

If a packet with a larger number of frames gets lost, the loss is more noticeable to the user.
With greater end-to-end delay, possible echos become more noticeable.

references:
http://toncar.cz/Tutorials/VoIP/VoIP_Basics_Codec_Latency_vs_Bandwidth.html

Overview of Voice Codecs

Once we have the audio signal represented as a sequence of samples, the next step is to compress it to reduce the consumption of network bandwidth required to transmit the speech to the receiving party.

The compression and decompression is handled by special algorithms we call codecs (COder-DECoder)

All the codecs listed below here expect the input to be audio sampled at 8 kHz with 16-bit samples.

G.711
G.711 is a codec that was introduced by ITU in 1972 for use in digital telephony, i.e. in ISDN, T.1 and E.1 links. The codec has two variants: A-Law is being used in Europe and in international telephone links, u-Law is used in the U.S.A. and Japan.

G.711 uses a logarithmic compression. It squeezes each 16-bit sample to 8 bits, thus it achieves a compression ratio of 1:2. The resulting bitrate is 64 kbit/s for one direction, so a call consumes 128 kbit/s (plus some overhead for packet headers). This is quite a lot when compared with other codecs.

This codec can be used freely in VoIP applications as there are no licensing fees. It works best in local area networks where we have a lot of bandwidth available. It's benefits include simple implementation which does not need much CPU power (can be implemented using a relatively simple table lookup) and a very good perceived audio quality - the MOS value is 4.2.

G.729

G.729 is a codec that has low bandwidth requirements but provides good audio quality (MOS = 4.0). The codec encodes audio in frames, each frame is 10 milliseconds long. Given the sampling frequency of 8 kHz, the 10 ms frame contains 80 audio samples. The codec algorithm encodes each frame to 10 bytes, so the resulting bitrate is 8 kbit/s for one direction.

When used in VoIP, we usually send 3-6 G.729 frames in each packet. We do this because the overhead of packet headers (IP, UDP, and RTP together) is 40 bytes and we want to improve the ratio of "useful" information.


G.729 is a licensed codec. As far as end users are concerned, the easiest path to using it is to buy a hardware that implements it (be it a VoIP phone or gateway). In such case, the licensing fee has already been paid by the producer of the chip used in the device.

A frequently used variant of G.729 is G.729a. It is wire-compatible with the original codec but has lower CPU requirements.


G.723.1

G.723.1 is a result of a competition that ITU announced with the aim to design a codec that would allow calls over 28.8 and 33 kbit/s modem links. There were two very good solutions and ITU decided to use them both. Because of that, we have two variants of G.723.1. They both operate on audio frames of 30 milliseconds (i.e. 240 samples), but the algorithms differ. The bitrate of the first variant is 6.4 kbit/s and the MOS is 3.9. The bitrate of the second variant is 5.3 kbit/s with MOS=3.7. The encoded frames for the two variants are 24 and 20 bytes long, respectively.

G.723.1 is a licensed codec, the last patent that covers it is expected to expire in 2014.

GSM 06.10
(also known as GSM Full Rate) is a codec designed by the European Telecommunications Standards Institute for use in the GSM mobile networks. This variant of the GSM codec can be freely used so you will often find it in open source VoIP applications. The codec operates on audio frames 20 milliseconds long (i.e. 160 samples) and it compresses each frame to 33 bytes, so the resulting bitrate is 13 kbit/s (to be precise, the encoded frame is exactly 32 and 1/2 byte, so 4 bits are unused in each frame). The codec's Mean Opinion Score is 3.7.

Speex
Speex is an open source patent-free codec designed by the Xiph.org Foundation. It is designed to work with sampling rates of 8 kHz, 16 kHz, and 32 kHz and can compress audio signal to bitrates between 2 and 44 kbit/s. For use in VoIP telephony, the most usual choice is the 8 kHz (narrow band) variant.

iLBC
iLBC (internet Low Bit Rate Codec) is a free codec developed by Global IP Solutions (later acquired by Google). The codec is defined in RFC3951. With iLBC, you can choose to use either 20 ms or 30 ms frames and the resulting bitrate is 15.2 kbit/s and 13.33 kbit/s, respectively. Much like Speex and GSM 06.10, you will find iLBC in many open source VoIP applications.


references:
http://toncar.cz/Tutorials/VoIP/VoIP_Basics_Overview_of_Audio_Codecs.html

MOS (Mean Opinion Score) value of a Codec

Previous article, we discussed about sampling of audio. Now having the audio signal represented as a sequence of samples, the next step is to compress it to reduce the consumption of network bandwidth required to transmit the speech to the receiving party.

The compression and decompression is handled by special algorithms we call codecs (COder-DECoder). Let's have a look at some popular codecs that are being used in Voice over IP. All the codecs we list here expect the input to be audio sampled at 8 kHz with 16-bit samples.

In the text below, we will mention the MOS of the codecs. MOS stands for "Mean Opinion Score". MOS measures the perceived quality of audio after it has been compresses by the particular codec, transmitted, and decompressed. The score is assigned by a group of listeners using the procedure specified in ITU-T standards P.800 and P.830. The interpretation of individual MOS values is as follows:

5 => Excellent
4 => Good
3 => Fair
2 => Poor
1 => Bad

G.711 MOS value = 4.2
G.729 MOS value is 4.0
G.723.1 variant 1 has MOS 3.9 and variant 2 has MOS 3.5
GSM 06.10 MOS is 3.7

references:
http://toncar.cz/Tutorials/VoIP/VoIP_Basics_Overview_of_Audio_Codecs.html 

VoIP Sampling and Quantization

When converting an analog signal (be it speech or another noise), you need to consider two important factors: sampling and quantization. Together, they determine the quality of the digitized sound.

Sampling is about the sampling rate — i.e. how many samples per second you use to encode the sound.

Quantization is about how many bits you use to represent each sample. The number of bits determines the number of different values you can represent with each sample.

Quantization

quantization is about how many bits you use to represent individual sound samples. In practice, we want to work with whole bytes, so let's consider 8 or 16 bits.

With 8-bit samples, each sample can represent 256 different values, so we can work with whole numbers between -128 and +127. Because of the whole numbers, it is inevitable that we introduce some noise into the signal as we convert it to digital samples. For example, if the exact analog value is "7.44125", we will represent it as "7". As we do this with each sample in the sequence, we slightly distort the signal — inject noise, in other words.

It turns out 8-bit samples do not result in a good quality. With only 256 sample values, the analog-to-digital conversion adds too much noise. The situation improves a lot if we switch to 16-bit samples as 16 bits give us 65536 different representations (from -32768 to +32767). 16-bit samples are what you will find on a CD and what VoIP codecs use as their input.


Sampling

With VoIP, you will most frequently encounter the sampling rate of 8 kilohertz. The frequency of 16 kHz can be used now and then in situations when a higher quality audio is required (with proportionally higher Internet bandwidth consumption).

The choice of sampling frequencies for the individual types of audio is not random. There is a rule (based on the work of Nyquist and Shanon) that the sampling frequency needs to be equal or greater than two times the transmitted bandwidth.

In short, It's good to remember that VoIP most frequently works with the sampling frequency of 8 kilohertz and each sample is stored in 16 bits.

references:
http://toncar.cz/Tutorials/VoIP/VoIP_Basics_Converting_Voice_to_Digital.html

Saturday, April 15, 2017

WebRTC Architecture - A recap

Its been a long time looked into the WebRTC Stack. Never noticed earlier that NetEQ was part of the stack for Jitter buffer optimization. Here is the full stack.


WebRTC Native C++ API
An API layer that enables browser makers to easily implement the Web API proposal.


Transport / Session
The session components are built by re-using components from libjingle, without using or requiring the xmpp/jingle protocol.

RTP Stack
A network stack for RTP, the Real Time Protocol.

STUN/ICE
A component allowing calls to use the STUN and ICE mechanisms to establish connections across various types of networks.

Session Management
An abstracted session layer, allowing for call setup and management layer. This leaves the protocol implementation decision to the application developer.

VoiceEngine
VoiceEngine is a framework for the audio media chain, from sound card to the network.

iSAC / iLBC / Opus

iSAC: A wideband and super wideband audio codec for VoIP and streaming audio. iSAC uses 16 kHz or 32 kHz sampling frequency with an adaptive and variable bit rate of 12 to 52 kbps.

iLBC: A narrowband speech codec for VoIP and streaming audio. Uses 8 kHz sampling frequency with a bitrate of 15.2 kbps for 20ms frames and 13.33 kbps for 30ms frames. Defined by IETF RFCs 3951 and 3952.

Opus: Supports constant and variable bitrate encoding from 6 kbit/s to 510 kbit/s, frame sizes from 2.5 ms to 60 ms, and various sampling rates from 8 kHz (with 4 kHz bandwidth) to 48 kHz (with 20 kHz bandwidth, where the entire hearing range of the human auditory system can be reproduced). Defined by IETF RFC 6176. NetEQ for Voice

A dynamic jitter buffer and error concealment algorithm used for concealing the negative effects of network jitter and packet loss. Keeps latency as low as possible while maintaining the highest voice quality.

Acoustic Echo Canceler (AEC)
he Acoustic Echo Canceler is a software based signal processing component that removes, in real time, the acoustic echo resulting from the voice being played out coming into the active microphone.

Noise Reduction (NR)
The Noise Reduction component is a software based signal processing component that removes certain types of background noise usually associated with VoIP. (Hiss, fan noise, etc…)

VideoEngine
VideoEngine is a framework video media chain for video, from camera to the network, and from network to the screen.

VP8
Video codec from the WebM Project. Well suited for RTC as it is designed for low latency.

Video Jitter Buffer
Dynamic Jitter Buffer for video. Helps conceal the effects of jitter and packet loss on overall video quality.

Image enhancements
For example, removes video noise from the image capture by the webcam.

references:
https://webrtc.org/architecture/

What is WebRTC NetEQ

A dynamic jitter buffer and error concealment algorithm used for concealing the negative effects of network jitter and packet loss. Keeps latency as low as possible while maintaining the highest voice quality.

NetEQ is an implementation of a Jitter Buffer optimised for voice.

references:
https://webrtcglossary.com/neteq/