Friday, October 9, 2020

WebSocket : Exchanging data frames - Part II

Either the client or the server can choose to send a message at any time — that's the magic of WebSockets. However, extracting information from these so-called "frames" of data is a not-so-magical experience. Although all frames follow the same specific format, data going from the client to the server is masked using XOR encryption (with a 32-bit key). Section 5 of the specification describes this in detail.


Frame format:  

​​

      0                   1                   2                   3

      0 1 2 3 4 5  6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1  2 3 4 5 6 7 8 9 0 1

     +-+-+-+-+-------+-+-------------+-------------------------------+

     |F|R|R|R| opcode|M| Payload len |    Extended payload length    |

     |I|S|S|S|  (4)  |A|     (7)     |             (16/64)           |

     |N|V|V|V|       |S|             |   (if payload len==126/127)   |

     | |1|2|3|       |K|             |                               |

     +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +

     |     Extended payload length continued, if payload len == 127  |

     + - - - - - - - - - - - - - - - +-------------------------------+

     |                               |Masking-key, if MASK set to 1  |

     +-------------------------------+-------------------------------+

     | Masking-key (continued)       |          Payload Data         |

     +-------------------------------- - - - - - - - - - - - - - - - +

     :                     Payload Data continued ...                :

     + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +

     |                     Payload Data continued ...                |

     +---------------------------------------------------------------+



The MASK bit tells whether the message is encoded. Messages from the client must be masked, so your server must expect this to be 1. (In fact, section 5.1 of the spec says that your server must disconnect from a client if that client sends an unmasked message.) When sending a frame back to the client, do not mask it and do not set the mask bit. We'll explain masking later. Note: You must mask messages even when using a secure socket. RSV1-3 can be ignored, they are for extensions.


The opcode field defines how to interpret the payload data: 0x0 for continuation, 0x1 for text (which is always encoded in UTF-8), 0x2 for binary, and other so-called "control codes" that will be discussed later. In this version of WebSockets, 0x3 to 0x7 and 0xB to 0xF have no meaning.


The FIN bit tells whether this is the last message in a series. If it's 0, then the server keeps listening for more parts of the message; otherwise, the server should consider the message delivered. More on this later.


Decoding Payload Length


o read the payload data, you must know when to stop reading. That's why the payload length is important to know. Unfortunately, this is somewhat complicated. To read it, follow these steps:


Read bits 9-15 (inclusive) and interpret that as an unsigned integer. If it's 125 or less, then that's the length; you're done. If it's 126, go to step 2. If it's 127, go to step 3.

Read the next 16 bits and interpret those as an unsigned integer. You're done.

Read the next 64 bits and interpret those as an unsigned integer. (The most significant bit must be 0.) You're done.


Reading and Unmasking the Data


If the MASK bit was set (and it should be, for client-to-server messages), read the next 4 octets (32 bits); this is the masking key. Once the payload length and masking key is decoded, you can read that number of bytes from the socket. Let's call the data ENCODED, and the key MASK. To get DECODED, loop through the octets (bytes a.k.a. characters for text data) of ENCODED and XOR the octet with the (i modulo 4)th octet of MASK. In pseudo-code (that happens to be valid JavaScript):


var DECODED = "";

for (var i = 0; i < ENCODED.length; i++) {

    DECODED[i] = ENCODED[i] ^ MASK[i % 4];

}


Message Fragmentation


The FIN and opcode fields work together to send a message split up into separate frames.  This is called message fragmentation. Fragmentation is only available on opcodes 0x0 to 0x2.


Recall that the opcode tells what a frame is meant to do. If it's 0x1, the payload is text. If it's 0x2, the payload is binary data. However, if it's 0x0, the frame is a continuation frame; this means the server should concatenate the frame's payload to the last frame it received from that client. Here is a rough sketch, in which a server reacts to a client sending text messages. The first message is sent in a single frame, while the second message is sent across three frames. FIN and opcode details are shown only for the client:


Client: FIN=1, opcode=0x1, msg="hello"

Server: (process complete message immediately) Hi.

Client: FIN=0, opcode=0x1, msg="and a"

Server: (listening, new message containing text started)

Client: FIN=0, opcode=0x0, msg="happy new"

Server: (listening, payload concatenated to previous message)

Client: FIN=1, opcode=0x0, msg="year!"

Server: (process complete message) Happy new year to you too!


Notice the first frame contains an entire message (has FIN=1 and opcode!=0x0), so the server can process or respond as it sees fit. The second frame sent by the client has a text payload (opcode=0x1), but the entire message has not arrived yet (FIN=0). All remaining parts of that message are sent with continuation frames (opcode=0x0), and the final frame of the message is marked by FIN=1




References:

https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API/Writing_WebSocket_servers

No comments:

Post a Comment