Category Archives: Uncategorized

Shared-Dictionary Compression for HTTP/S

(x-post from LinkedIn Engineering Blog)

HTTP protocol has been the glue that holds together the Web, mobile apps and servers. In the world of constant innovation, HTTP/1.1 appeared to be the only safe island of consistency until the announcement of HTTP/2. Yet, even with HTTP being as robust and efficient as it is, there is still room for improvement, and this is what this post is about. LinkedIn’s Traffic Infrastructure team is making LinkedIn faster by exploring ways in which HTTP can be improved.

Traditionally, HTTP communications are compressed using either gzip or deflate algorithms. Of the two, gzip strikes the balance of being aggressive enough to reach a good compression ratio, while not having ridiculous CPU requirements. Algorithms such as Burrows-Wheeler transform (popularized through bzip2) offer higher degrees of compression, but have higher CPU requirements. Until recently it was generally accepted that gzip is the best way to compress HTTP traffic. Yet there is an alternative to using computationally intense compression algorithms to transfer less data over the wires – this alternative is to start with sending less data in the first place.

Enter SDCH

SDCH (pronounced “Sandwich”) is a an HTTP/1.1-compatible extension, which reduces the required bandwidth through the use of a dictionary shared between the client and the server. It was proposedby four Google employees over five years ago. It utilizes a relatively simple mechanism that has the potential to drastically reduce transmission sizes. Currently Google Chrome and other browsers based on the Chromium open-source browser project support SDCH, but until recently Google was the only Internet site to support SDCH on the server. In this post, we will be sharing an overview of how it works, our experience implementing it and perhaps shed some light on why its use isn’t very widespread despite its remarkable potential.

SDCH is very simple at first glance. Instead of compressing every page as it was independent document, it takes into consideration that many sites have many common phrases occurring across multiple pages – navigation bars, copyrights, company names, etc. In addition to phrases, modern pages share a good deal of CSS files, JavaScript code, etc. Once those common elements have been downloaded, there is no need to transfer them over and over again. Instead, the common elements are enumerated in a dictionary. If the browser and the server have same revision of the dictionary, then the server can start transferring only short keys, instead of long values. The browser will use the dictionary to substitute keys for phrases.

LinkedIn is committed to privacy of its users, and therefore we rely on browsers advertising support for SDCH for HTTPS schemas. Chromium does that, other browsers will follow the suit.

Negotiations

For SDCH to work, several conditions have to be met. First, both the browser and the server must support SDCH. Second, both the browser and the server must share same revision of the dictionary – using two different dictionaries will result in broken pages due to missing style sheets, invalid JavaScript or unfinished sentences.These conditions are negotiated through HTTP headers as the browser makes a GET request. In plain English, the negotiation can be described as following:

browser: Hi! I need to GET /page.html. And BTW, I support SDCH

server: Hi! Here is the page. And BTW, here’s URL to get SDCH dictionary!

… (browser downloads dictionary in the background) …browser: Hi I need to GET /another.page.html. And BTW I support SDCH -here is my client-hash (see below)

server: Hi! Here is the page! And BTW, since your dictionary is up to date, the page is SDCH encoded!

In both phases, the client advertises SDCH support with the Accept-Encoding: sdch header. If the client has a dictionary, it advertises that with the Avail-Dictionary: client-hash header. We will talk about the hash generation later.

Once the browser is done fetching the dictionary, it is expected to advertise the availability of this dictionary by including the Avail-Dictionary header with the next request. This header contains a unique identifier for the provided dictionary called “client hash”. At this point the server can start sending SDCH-compressed traffic (marked with the Content-Encoding: sdch header). The SDCH-compressed payload is prefixed by “server-hash”. The two hashes combined form the SHA256 digest of the dictionary, which allows the browser and the server to agree that they use the same dictionary.

From this point on the server can start SDCH compression: it substitutes the long common strings included in the dictionary with shorter representations, includes sdch in the Content-Encodingheader and prepends the body with the server hash.

Client receives the response and decodes the content by doing what the server did in reverse.

Making Sandwiches

SDCH is very simple at first glance, it makes one wonder why it isn’t more popular. After all, all you need to do is to generate a dictionary of long common strings found in your static content and have a web server that supports SDCH. How hard can that be? Let’s find out:

First, you need to generate a dictionary of long common strings from your static content. The open-source solution Femtozip by Garrick Toubassi seems to be the only available solution at this point. We have run into a performance issue that we fixed and we contributed the patch back. Running SDCH in production requires generating the dictionary fairly often, ideally every time we deploy a front-end application. That comes with a price of extra deployment time, about 7 hours for our production dataset. At LinkedIn we are deploying up to 3 times a day, so we opted for “near-line” dictionary generation – we generate the dictionary in the background every two weeks. At this cadence we can benefit from SDCH compression without paying the overhead at every deployment.

The second challenge is presented by versioning of static content. Every company has a different process for versioning and building their static content and this step may be challenging for some as dictionary generation may take a very long time. For websites that have a large collection of static content, generation of the dictionary can take days. A map-reduce approach can potentially speed up dictionary generation, but this is beyond the scope of this post. Let’s assume, for the sake of simplicity, that you didn’t have that many files and haven’t had any issues with this; now you have a dictionary for your static content and you are serving it from your main site under the “/my_dictionary” partial url.

The third thing you need is a web server that supports SDCH. You are out of luck on this one as there are currently only a few publicly available solutions for serving SDCH-encoded content. You will need to code your own module/plugin/webserver for this purpose. Since we extensively use Apache Traffic Server at LinkedIn, our experiment involves a custom Traffic Server plugin that uses atscppapi andopen-vcdiff libraries. We are in the process of open-sourcing the plugin.

Challenges

Femtozip outputs a dictionary that can be used for SDCH with minor modifications. You need to prepend it with the SDCH dictionary headers so that the browser knows on which domain this dictionary can be used and under which paths is this dictionary valid. We discovered that Chrome aggressively blacklists domains and does not tell you why your domain is blacklisted. This turned debugging to an arduous task. Fortunately, the Yandex Browser has a tab under their net-internals page that gives you the error codes that you can conveniently lookup by checking outsdch_manager.h under the Chromium source-code.What we discovered is that if the browser advertises its dictionary using the Avail-Dictionaryheader, the server is expected to encode the response body with SDCH no matter what. Even if the response doesn’t contain sdch under the Content-Encoding header, the browser will still try to decode the received body, fail and blacklist your domain from further SDCH use until the browser restarts. If the server doesn’t want to encode with SDCH and if the client has advertised an available dictionary, the server has to include the X-Sdch-Encode: 0 header in the response to explicitly indicate that the response body is not SDCH-encoded.

Results

The preliminary results we had with our SDCH initiative at LinkedIn have been nothing short of remarkable. We generated two dictionaries, one for 1282 CSS files, and one for 6225 Javascript files. Each dictionary was about 700kB and these dictionaries were less than 100kB when compressed and served with gzip. The data we collected compares gzip with sdch+gzip. When sdch and gzip are combined, we have seen additional compression as high as 81% on certain files when compared to gzip only. Average additional compression across our static content was about 24%. It takes us about 400 microseconds on average to encode a single static content file on the fly. Since our static content is cached through CDNs, it only needs to be encoded every once in a while. In order to assure that CDNs knew how to distinguish between a page encoded with one dictionary from another, we also needed to append the Avail-Dictionary into our Vary response header. We think SDCH has a lot of potential and we expect to see a lot of new developments very soon.

Credits and Final Notes

Dzmitry Markovich championed the idea of using SDCH at LinkedIn. The plugin was developed by Can Selcik, during his internship at LinkedIn. It is worth mentioning that the specs have changed quite a bit since the initial announcement of SDCH. Randy Smith, Chris Bentzel and the Chromium developers have been extremely helpful and none of this would be possible without them.

Singletrack as a scene for growth

TL;DR: another piece of introspection.

Looking back, it has been interesting year from the racing and training perspective. Lots of introspection, illusion breaking and reframing the question of why racing is important to me.

I would like to highlight this question.

Do I race because of boredom? Is it my inherent competitive drive that pushes me? Is it my way to find an outlet for stress and negative emotions?

To a degree, all explanations above is true. Yet, all of them miss the point.

I race because I want to build a system where I can constantly control my personal growth. And I need this control so that I can constantly improve. And I need to constantly improve because I ultimately believe there is no stable point. Either we are improving or we are losing our edge. There is no such thing as “reaching stable point and relaxing. Life is a dynamic balance.”

As I type it, I hear a voice in my head “wait a second, Rhynolite. At any given second there is somebody you know that has hit a plato and is trying hard to push through it. Whiskey Tango Foxtrot?

To answer this, I would like to first recall the wonderful world of Planiverse, of which I read 30 years ago in the amazing book “The New Mathematical Diversions” by Martin Gardner. Planiverse, a world created by A.K. Dewdney, everything is, flat. The earth is flat (yet round), and the people are flat. More than that, men are always facing right, and women are always facing left. Which makes it really handy for dating, but what if a teenage boy wants to have a serious talk to his father? We can only hope the Dad did enough waiters walks to be able to hold the boy upside down for the duration of the talk… or maybe they should use Flat version of Skype… I diverge.

Well, if we were living in Planiverse, and we were single dimensional, and everything would be measured on a single plane, then of course, the model of “Dynamic Balance” would ensure we would never see any plato.

However, we are multidimensional (or at least meant to be such) and we can happily improve at one direction while losing in other… projected into our favorite activity that would mean no progress at all.

On the other hand, projecting the progress into a simple system – such as bicycle racing, I can see my progress and I can develop the basic skills I need to keep myself from stagnating, and eventually becoming irrelevant.

Tearing down connections

Summary

Recently a colleague was debugging server software and was looking for a way to simulate dropped TCP connection. The easiest way to drop a TCP connection is terminating either the server or the client process – once the process is terminated, the operating system will “clean up” all resources used by the process, including open TCP connections.

In my colleague’s case, though, killing either a client or a server was not desirable. What she looked for was terminating a connection and having the client establish a new one.

The approach we took was crafting a “TCP reset” packet and sending it to a server. The server was led into belief that the client just hang up on it, and proceeded to establish a new connection, which was exactly what we were looking for.

The gritty details

The TCP protocol provides two ways of closing connections.  The normal way is requires each party (the client and the server) to send a FIN packet, which tells the other party (the server and the client, respectfully) that no more data is goign to come. By sending a FIN packet each way, the parties agree that the conversation is done, and all that was to be said was.

This rule, called “the 4-way teardown” exists to ensure that the nice abstraction of TCP connection is maintained over the messy world of retransmits, delays and dropped bits, which is common in networking.

Every rule has its exception, though. In the case of “send a FIN after you’re done talking” rule, the exception is RST, which is the equivalent of “shut the flock up“. An RST means that someone received data but was not ready for that data to be received.

RST packet was exactly what we were looking for. So the first question was: Where to send this packet?

Each TCP connection has two endpoints, described by the IP address (that is the machine, or rather the network card on the machine) and the TCP port number (this is a channel on the local machine).

The TCP port number on a server is well-known and is advertised as part of the server’s URL. A client, on the other hand, picks up a random (ephemeral in TCP jargon) TCP port number.

A non-obvious design choice is to send the RST packet to a server, and not to a client. This allows us to use the well-known port number on the server, rather than the one on the client.

Once the endpoint is identified, the task is to send a TCP RST segment to the port identified by the IP address and the TCP port. While historically crafting TCP segments is being done by opening a RAW socket and crafting the bits and the checksums of the segment, we used the “hping” utility for that.

The first version of the script did not work, as we missed an important caveat: the sequence numbers had to line up.

TCP protocol ensures integrity of transmitted data by associating position in the stream with every chunk of data it transmits.  If a chunk data has to be re-sent, the stream position will be re-sent along the data, so that the receiving end will know what part of data is being resent.

The TCP protocol calls these position identifiers “sequence numbers”. By default, a TCP endpoint will only accept packets that are in the expected range of sequence numbers, and will reject anything that would fall outside of the range.

These numbers do not have to be guessed, however. If packets are going to be injected from one of the endpoints, then one can intercept the connection and inject RST segment into a valid TCP window.

The following scriptlet does the work, provided you have tcpdump and hping installed on your system:

#!/usr/bin/env perl

use Getopt::Long ;

$DEV="" ;

$PORT=1899 ;

$ADDR="";

$DEBUG = 0 ;

$result = GetOptions( "port=i" => \$PORT,
		      "dev=s" => \$DEV,
		      "addr=s" => \$ADDR,
		      "dbg=i" => \$DEBUG) ;

chomp($DEV);

if ( $DEV eq "" && $ADDR eq "" ) {
	print "Usage: must specify either device or server address. Exiting\n " ;
	exit (1) ;
} elsif ( $DEV eq "" && $ADDR ne "") {
	open (NETSTAT_NR, "netstat -nr|");

	while (<NETSTAT_NR>) {
		my $nline = $_ ;
		if ( $nline =~ /(\w+\.\w+\.\w+\.\w+)\W+([\w\.\:\#]+)\W+(\w+)\W+(\d+)\W+(\d+)\W+(\w+).*/ ) {
			my ($route_dest,$route_gw, $route_flags,$route_refs,$route_use,$route_netif) =
				($1,$2,$3,$4,$5,$6) ;
			if ($DEBUG) {
				print "route dst: $route_dest\n";
				print "route gw: $route_gw\n";
				print "route flags: $route_flags\n";
				print "route refs: $route_refs\n";
				print "route use: $route_use\n";
				print "route netif: $route_netif\n";
				print " ----- \n\n";
			}
			chomp ($route_netif);
			if ( $ADDR eq $route_dest) {
				$DEV = $route_netif ;
				last ;
			}
		}
	}

	close(NETSTAT_NR);
}

print "DEV == $DEV\n" ;

$pkt_count=3; # number of RST packets to send

open (TCPDUMP, " tcpdump -l  -i ${DEV} -A -S -n tcp port ${PORT}  |") ;

print "Waiting for traffic on ${PORT}..." ;

while (<TCPDUMP>) {
	my ($line) = ($_);

	if ( $line =~ /\d+:\d+:\d+.\d+\WIP\W\d+\.\d+\.\d+\.\d+.(\d+) > (\d+\.\d+\.\d+\.\d+)\.(\d+): P (\d+):(\d+)\(\d+\) ack (\d+) win (\d+).*/) {

		my ($src_port,$dst_ip,$dst_port,$seq_num,$next_seq_num,$ack_num,$win) = ($1,$2,$3,$4,$5,$6,$7);

		if ( $DEBUG ) {
			print "src port: $src_portn";
			print "target ip: $dst_ip\n";
			print "dst port: $dst_port\n";
			print "seq num: $seq_num\n";
			print "next seq num: $next_seq_num\n";
			print "ack num: $ack_num\n";
			print "window: $win\n" ;
		}

		my $cmd = "hping -c $pkt_count -s $src_port -p $dst_port -M $next_seq_num -L $ack_num -R  $dst_ip" ;

		print "$cmd\n" ;

		system($cmd);

		last ; # exit the listen loop, we found our target
	}
}
END { close (TCPDUMP) ; }

Art of Leadership

In the Santa Cruz mountains, they would say: “Ben Zander is full of awesome”.

Ben Zander is famous conductor.  I have to admit, I did not know he also writes books. Not until we went to a book store, where my wife found  little yellow book, called “The Art of Possibility” by Ben and Rosamund Zander.

“I have to read this. Ben Zander is a great writer”, she said.

Ben Zander is great conductor. And a great writer. And a great teacher.  The little yellow book has changed the way I tell people what has to be done.

Ben starts by telling, that “everything” is “invented.” Our perception is subjective, and, in some form, “invented.” By changing the perception, we can change our reactions and our actions, this is the preach.

As experienced teacher, Ben sets a homework – exercises for changing our perception. These are not easy, not at all. These excercizes work, though. Slowly, more and more possibilities unfold.

As the rule of 10,000 tells us,  it would take 10,000 exercises to master this kind of emotion control. Ben does not leave us at this kind of emotion control, but rather goes on and explains, how a leader can unfold possibilities to these whom he leads.

I really glad my wife gave me this book.

Reflecting on the poor performance on Saturday

The reason behind my poor performance was more mental than physical. I crashed by pushing the envelope too much, which was caused by anxiety, which was caused by lack of confidence.

Next race I will stick to the game plan and will

Dr. Cohn writes in his article on mental abilities for motocross racers about mental tune-up before the race.  He highlights the following points to cover:

  • confidence
  • focus
  • ability to perform naturally and instinctively
  • relax and try less when you want to win badly
  • handle errors
  • mental prep

While these suggestions are more for a motocross racer, I find all of these very important for XC racing.

The Grand Unified Theorem of Reverse Engineering

The Grand Unified Theorem of Software Engineering is not a theorem, actually. It’s a joke, with a grain of truth in it. It states that “You can solve every problem with another level of abstraction, except for the problem of too many levels of abstraction.” 

The Grand Unified Theorem of Reverse Engineering is not a theorem either. This joke refers to the above mock theorem, by stating that “You can find answer to any question by breaking another level of abstraction, except for the problem of having no useable abstraction to start with”

Some would say that the former refers to Math, while the latter refers to Science. Some other would say exactly the opposite.

Yosemite Half Dome at sunset

My daily job includes a fair share of researching implementations of ad-hoc defined network protocols. This process is tedious and sometimes frustrating, yet sometimes it brings its “Aha!” moments. Sometimes I like to compare reverse engineering with hiking in the mountains. Lots of striding in even pace, intermixed with moments of excitement. Just like the natural wonders protect themselves from visitors by steep trails and annoying weather, so do software designs hide behind mundane details. Just like in hiking, the excitement of seeing the monolith of architecture, when it reveals itself, ,  

Modern computer architectures make learning protocols both easier and harder. Java, Python or .NET programs have very rich runtime environment, which brings structure and metainformation to start with. On the other hand, “native” or “unmanaged” code do not have the same structure? or has it? 

While working with native code, it’s very imporatant to undestand the assumptions that were taken when the code was written. Is the code multi-platform? In this case, chances are the software was written in the “stratified” way: specifics of different platforms are abstracted out into common APIs. Is the code network specific? In this case, probably it will use Berkeley Sockets API or its equivalent. 

Short list of tools I have to use routinely:

  • Wireshark, the protocol analyzer
  • IDA Pro, the interactive disassembler 
  • dumpbin, which is actually a binary process visualizer
  • VMWare, which is a great way to set up an environment

Some other tools I use infrequently:

  • SCAPY, the interactive network protocol client
  • OllyDbg, the debugger

In general, the process of reversing application consists of trial and error, in understanding the processes and implementation.  This is where VMWare Snapshots become very useful feature. 

Ilfak Guilfanov, the author of IDA interactive disassembler, has a blog: www.hexblog.com.

Unicode Byte Order Detection

Once in a while I run into the following problem: give a stream of bytes, which are know to represent Unicode characters, what is the correct byte order?

Assuming that the Unicode byte stream is valid Unicode stream, we can clearly see that if character Ci is in certain byte order, then the next character, Ci+1 will be in same encoding as well, unless a byte order mark (BOM) specifies the byte order for following bytes.  Moreover, assuming the Unicode byte stream represents character stream, there’s certain probability that adjacent characters belong to same word.

Same words are usually written in same language, which implies same Unicode code page. In other words: in human text, number of different code pages will be low.

This is important property, which we are going to exploit.

This number of different code pages clearly depends on the document: for a math text in English, some words will be in English, some in Greek and some will be special symbols. On the other hand, a biology text in Romanian will include Romanian words intermixed with biology terms in Latin (probably, in italic).

The location of codepage bytes is governed by the byte order used by the Unicode byte stream. By determining, which bytes are used for codepages, we can detect the byte order. 

The algorithm is easy:

  1. take a sampling buffer, _buf  from the stream. The size of _buf should be sufficiently large for detection, but not too large so to slow the algorithm down. 64 bytes sounds like a good guess.
  2. For a first pass: 
    1. Allocate 2 hash sets: LO_BYTE and HI_BYTE
    2. Iterate over the _buf in 2 byte offset increments (stored in offset). Insert _buf[offset] to HI_BYTE and _buf[offset+1] to LO_BYTE 
  3. For a second pass:
    1. Allocate 2 hash sets: LO_NIBBLE and HI_NIBBLE
    2. Iterate over the _buf in 4 byte offset offset. Insert (_buf[offset]&0xFF)<<8|_buf[offset+1]&0xFF to HI_NIBBLE and _buf[offset+2]&0xFF)<<8|_buf[offset+3]&0xFF to LO_NIBBLE 
  4. Chose the hash set best with lowest number of different entries nbest.
    1. if nbest > EXPECTED_NUMBER_OF_LANGUAGES  probably the encoding is UTF8
    2. otherwise, if best is LO_BYTE, the encoding is UTF-16LE 
    3. otherwise, if best is HI_BYTE, the encoding is UTF-16BE
    4. otherwise, if best is HI_NIBBLE, the encoding is UTF-32BE
    5. otherwise, if best is LO_NIBBLE, the encoding is UTF-32LE
    6. otherwise, check your implementation, it probably contains a bug.

Of course, if the BOM (byte order mark) is present, the whole issue of detecting the encoding becomes moot.