Keep-Alive in http requests in golang

The three-way handshake in a TCP connection setup is heavy wrt performance and re-using the already created connections is a considerable optimization. Keep alive is a method to allow the same tcp connection for HTTP conversation instead of opening a new one with each new request. Most servers and clients allow for configurations and options to utilise this optimisation.

I had a strange encounter with the way this is handled in go.

I’ve a go application - an nsq (a distributed message queue) consumer - which reads some data off nsq and send that data over a HTTP POST to a remote server over SSL. This involves a TLS handshake which is even more heavy than non-ssl handshake.

We found that nsq queue depth for the topics was high and increasing, additionally we kept seeing errors like in nsqd logs:

[nsqd] 2016/12/16 17:48:34.563958 ERROR: [19.68.8.7:55239] - E_FIN_FAILED FIN 0b3caf88 failed ID not in flight - ID not in flight
[nsqd] 2016/12/16 17:48:34.567675 ERROR: [19.68.8.7:55239] - E_FIN_FAILED FIN 0b3ca9f4 failed ID not in flight - ID not in flight
[nsqd] 2016/12/16 17:48:34.571008 ERROR: [19.68.8.7:55239] - E_FIN_FAILED FIN 0b3c955c failed ID not in flight - ID not in flight
[nsqd] 2016/12/16 17:48:34.573917 ERROR: [19.68.8.7:55239] - E_FIN_FAILED FIN 0b3c86f8 failed ID not in flight - ID not in flight

Based on the limited number of documents we got over the internet it seemed like this was due to a slow consumer (hints like need for heartbeat from consumer pointed in this direction). Since consumer was very simple - just doing some http posts so it was the only suspect.

We did a strace on the running process to see the read,connect system calls. It indeed was doing some certificate interchange (read) for each connect or POST:

[pid 18043] read(90, "g\201\f\1\2\0020|\6\10+\6\1\5\5\7\1\1\4p0n0$\6\10+\6\1\5\5\0070\1\206\30http://ocsp.digicert.com0F\6\10+\6\1\5\5\0070\2\206:http://cacerts.digicert.com/DigiCertSHA2SecureServerCA.crt0\f\6\3U\35\23\1\1\377\4\0020\0000\r\6\t*\206H\206\367\r\1\1\v\5\0\3\202\1\1\0\222\327\361\374\211T\330\363\305~8Z\270\210\313<Z\336o\274?\366\34\252\206Q\332\0044\257'\3751\205\273sa\200\251\325\224\267\\*\221\1/Ws\246\351bl=\330q?\200\256f\21\257\331\246\242w\313\202\245\361\241a?\2\345<a\35\313n\276\220\217k\377C\335}\235\2000+%?… ", 3166) = 1948

From the code, we noticed that:

Since keep-alive is already there so we thought #1 is the problem which is causing new connections to be created for every request which in turn leads to the TLS handshake (and eventually to slow consumer). We fixed this by creating a global client:

 14 var client *http.Client
 15 
 16 func init() {
 17     tr := &http.Transport{
 18         MaxIdleConnsPerHost: 1024,
 19         TLSHandshakeTimeout: 0 * time.Second,
 20     }
 21     client = &http.Client{Transport: tr}
 22 }

and using this to do the Post:

 31     resp, err := client.Post("https://api.some-web.com/v2/events", "application/json", bytes.NewBuffer(eventJson))
 32     if err != nil {
 33         log.Println("err", err)
 34         return err
 35     }
 36  
 37     defer resp.Body.Close()

But we observed that problem persisted still (slowness as evident from errors in nsqd logs). We confirmed by doing strace again:

$ sudo strace -s 2000 -f -p 18120 -e 'read,connect' 
[pid 18120] read(90, "g\201\f\1\2\0020|\6\10+\6\1\5\5\7\1\1\4p0n0$\6\10+\6\1\5\5\0070\1\206\30http://ocsp.digicert.com0F\6\10+\6\1\5\5\0070\2\206:http://cacerts.digicert.com/DigiCertSHA2SecureServerCA.crt0\f\6\3U\35\23\1\1\377\4\0020\0000\r\6\t*\206H\206\367\r\1\1\v\5\0\3\202\1\1\0\222\327\361\374\211T\330\363\305~8Z\270\210\313<Z\336o\274?\366\34\252\206Q\332\0044\257'\3751\205\273sa\200\251\325\224\267\\*\221\1/Ws\246\351bl=\330q?\200\256f\21\257\331\246\242w\313\202\245\361\241a?\2\345<a\35\313n\276\220\217k\377C\335}\235\2000+%?… ", 3166) = 1948

This appeared for each connection request. This meant that golang was somehow not honouring the keep-alive. This is when this thread on stack overflow helped us -

You should ensure that you read until the response is complete before calling Close().

res, _ := client.Do(req)
io.Copy(ioutil.Discard, res.Body)
res.Body.Close()

To ensure http.Client connection reuse be sure to do two things:

So we followed the suggestion:

 31     resp, err := client.Post("https://api.some-web.com/v2/events", "application/json", bytes.NewBuffer(eventJson))
 32     if err != nil {
 33         log.Println("err", err)
 34         return defaultErrStatus, err
 35     }
 36 
 37     io.Copy(ioutil.Discard, resp.Body)   // <= NOTE 
 38 
 39     defer resp.Body.Close()

While, as a I see the golang implementation, it does include a subtle comment in the code but I don’t understand yet why this idiosyncrasy exists. But would be something to be aware of and keep on the back of mind.

 // Post issues a POST to the specified URL.
 //
 // Caller should close resp.Body when done reading from it.
 //
 // If the provided body is an io.Closer, it is closed after the
 // request.
 //

Take Away