Improving API Performance with HTTP Keepalive | State Farm Engineering

Table of Contents

Introduction

performance of business functionality is authoritative. To keep this at the vanguard, modern clientele functionality is backed by one or more customer-facing APIs ( Application Programming Interfaces ) which are much backed by a serial of microservices. Any come of unnecessary response meter in a deeply-nested service can cause dense operation to customers, potentially creating inefficiencies and diminishing customer gratification .
A well-performing, simple API might look like this :
% % { init : { “ subject ” : “ base ”, “ succession ” : { “ fontFamily ” : “ monospace, monospace ; ” }, ” themeVariables ” : { ” primaryColor ” : “ # e4e3e3 ”, ” primaryBorderColor ” : “ # acabab ”, ” noteBkgColor ” : “ # f2ddbb ”, ” noteBorderColor ” : “ # acabab ” } } } % % sequenceDiagram participant A as API participant B as Backend A- > > +B : Request ( 0.5ms net latency ) bill over B : Database question time ( 10 milliseconds ) B- > > -A : Response ( 0.5ms network reaction time )
In this example, we can expect an 11ms answer time. however, this example is far besides childlike for microservice environments. For example, examine the pursue customer-facing example :

% % { init : { “ composition ” : “ nucleotide ”, “ sequence ” : { “ fontFamily ” : “ monospace, monospace ; ” }, ” themeVariables ” : { ” primaryColor ” : “ # e4e3e3 ”, ” primaryBorderColor ” : “ # acabab ”, ” noteBkgColor ” : “ # f2ddbb ”, ” noteBorderColor ” : “ # acabab ” } } } % % sequenceDiagram actor A as Customer participant P as Public API player O as Orchestration API participant B1 as Backend API 1 player B2 as Backend API 2 A- > > +P : Request ( 15ms network latency ) P- > > +O : Request ( 0.5ms network latency ) O- > > +B1 : Request ( 0.5ms network reaction time ) note over B1 : Database question time ( 10 milliseconds ) B1- > > -O : Response ( 0.5ms network rotational latency ) note over A : customer waits 53ms sum O- > > +B2 : Request ( 0.5ms network reaction time ) note over B2 : Database question time ( 10 milliseconds ) B2- > > -O : Response ( 0.5ms network latency ) O- > > -P : Response ( 0.5ms network reaction time ) P- > > -A : Response ( 15ms network latency )
These operation issues can be promote exacerbated by running cross-data center or cross-cloud. For exemplify, if your public customer-facing API needs to authorize the user with an external identity supplier, that authorization will incur much greater network reaction time .

Defining Response Time

We ’ ve assumed a small total of network reaction time for each package, whereas in reality it is not indeed dim-witted. When a customer makes a request, the node must negotiate the connection with your API ’ s waiter. then, it negotiates Transport Layer Security ( TLS ). last, the request cargo can be sent. Each of these packets compounds on crown of the regular network one-way trip latency .
% % { init : { “ subject ” : “ base ”, “ sequence ” : { “ fontFamily ” : “ monospace, monospace ; ” }, ” themeVariables ” : { ” primaryColor ” : “ # e4e3e3 ”, ” primaryBorderColor ” : “ # acabab ”, ” noteBkgColor ” : “ # f2ddbb ”, ” noteBorderColor ” : “ # acabab ” } } } % % sequenceDiagram actor A as Customer participant P as Public API A- > > P : SYN ( Synchronize ) ( 15ms ) P- > > A : ACK ( Acknowledge ) ( 15ms / 30ms sum ) A- > > P : SYN/ACK ( 15ms / 45ms full ) note over A, P : joining is now established.
Total elapsed fourth dimension > = 45ms A- > > P : TLS Client Hello ( 15ms / 60ms full ) P- > > A : thallium Server Hello / Certificate Exchange ( 15ms / 75ms sum ) A- > > P : Key Exchange / Change Cipher ( 15ms / 90ms total ) P- > > A : change Cipher ( 15ms / 105ms entire ) note over A, P : Connection & TLS are established.
now communication can begin.
Total elapsed time > = 105ms A- > > +P : HTTP Application Data ( 15ms / 120ms sum ) note over P : Destination API march fourth dimension not included P- > > -A : HTTP API Response ( 15ms+ / 135ms+ sum ) bill over A, P : Application data ACKs & FIN/ACK are not blocking in this scenario.
They have been omitted for brevity .
A dim-witted API request where the exploiter has a network reaction time of 15ms can take over 135ms. Connection establishment can be a performance killer !
In this model, we didn ’ t take into account :

  • Computational inefficiencies (e.g. CPU wait)
  • Network jitter
  • Network congestion or traffic deprioritization
  • Connection establishment from the Public API to the backend APIs
  • Packet loss
  • Other delays

real populace testing indicates that performance is, on average, significantly worse than in this theoretical case .
extra assumptions in the above case :

  • Using either HTTP/1.1 or HTTP/2, which rely on TCP (Transmission Control Protocol). TCP establishes a connection using SYN/ACK/SYNACK packets. HTTP/3 will eliminate those three packets by using UDP (User Datagram Protocol), which has no connection establishment phase.
  • Using TLSv1.2. TLSv1.3 reduces response time marginally by removing one hop.

Solutions

now that I ’ ve explained why connection between data centers, and over the internet, performs ill, let ’ s spill the beans about what we can do to improve performance .

Solution Description
Reduce distance between data centers Opportunities exist to use AWS Local Zones or AWS Outposts, but not a solution for most use cases
Reduce the need to jump between data centers by hosting, pre-fetching, or caching data where it is needed Hosting data is tricky. Pre-fetching requires foreknowledge of future incoming requests. Caching requires careful planning and has risks
Reduce number of round-trips needed to establish connections TLSv1.3 reduces one hop. TLSv1.3 pre-shared keys (PSKs) result in zero round-trip time TLS negotiation, but require pre-planning for the client and server. In the future, HTTP/3 will eliminate the use of TCP, further reducing connection establishment and overhead time
Use gRPC (Remote Procedure Call) Requires rearchitecute of your API systems, but also provides a robust feature set
Reuse established connections Easy to do with HTTP Keepalive

Solution Testing

I set up a test rig between AWS Lambda in AWS ’ s us-east-1 North Virginia area and a data center near Dallas, Texas. I ran one test each for HTTP/1.1 / HTTP/2, TLSv1.2 / TLSv1.3, and No Keepalive / Keepalive, and all combinations thereof. See Appendix – > Testing Setup to learn more about the testing setup. Some key findings :

  • Keepalive is crucial to good performance for repeated requests
  • TLSv1.3 negotiation is faster than TLSv1.2 (noticeable on the Without Keepalive bars below)
  • TLS negotiation becomes negligible if you are using keepalive, because it happens only once out of thousands of calls
  • In an unexpected turn of events, HTTP/2 is slower than HTTP/1.1 for single-threaded API calls (more on this later; HTTP/2 can be much faster than HTTP/1.1 depending on usage)

Code could not finish, this are some reasons why this happen. – plot name not defined. The first parameter of the shortcode is the appoint. – There is a syntax error. check browser console .

HTTP Keepalive

Reusing connections saves significantly on reply time. Check out this graph of average AWS Lambda reception times in the real universe. The clear line is the original Lambda invocation clock time. This particular Lambda invokes an on-premises API twice consecutive. Enabling Keepalive saved the low-level formatting clock twice per invocation. In reality, the results were better than my theoretical measurements. Each keepalive call saved a sock 206ms ( 412ms sum ) when compared to the lapp bid without HTTP Keepalive .
Code could not finish, this are some reasons why this happen. – plot name not defined. The first parameter of the shortcode is the name. – There is a syntax error. check browser comfort .
Why is that ? Let ’ s take a second look at the connection administration diagram from above, but this time we ’ ll use keepalive :
% % { init : { “ theme ” : “ base ”, “ sequence ” : { “ fontFamily ” : “ monospace, monospace ; ” }, ” themeVariables ” : { ” primaryColor ” : “ # e4e3e3 ”, ” primaryBorderColor ” : “ # acabab ”, ” noteBkgColor ” : “ # f2ddbb ”, ” noteBorderColor ” : “ # acabab ” } } } % % sequenceDiagram actor A as Customer participant P as Public API opt first request A- > > P : SYN ( Synchronize ) ( 15ms ) P- > > A : ACK ( Acknowledge ) ( 15ms / 30ms total ) A- > > P : SYN/ACK ( 15ms / 45ms sum ) note over A, P : connection is now established.
Total elapsed clock > = 45ms A- > > P : TLS Client Hello ( 15ms / 60ms total ) P- > > A : thallium Server Hello / Certificate Exchange ( 15ms / 75ms entire ) A- > > P : Key Exchange / Change Cipher ( 15ms / 90ms entire ) P- > > A : change Cipher ( 15ms / 105ms sum ) eminence over A, P : Connection & TLS are established.
now communication can begin.
Total elapsed time > = 105ms A- > > +P : Request ( 15ms latency ) P- > > -A : Response ( 15ms reaction time ) note over A, P : Request 1 done
Total elapsed time > = 135ms
… Connection baseless until another request comes in … end choose second gear request A- > > +P : Request ( 15ms latency ) P- > > -A : Response ( 15ms latency ) note over A, P : Request 2 done in > = 30ms end choose third base request A- > > +P : Request ( 15ms latency ) P- > > -A : Response ( 15ms latency ) note over A, P : Request 3 done in > = 30ms
… and so on … end
The huge majority of overhead is incurred during initial connection. here ’ s another view at how amazing keeping your connections alive can be. This data is taken from a real test from a machine near Atlanta, Georgia to a datum center near Dallas, Texas .
Code could not finish, this are some reasons why this happen. – plot name not defined. The first parameter of the shortcode is the diagnose. – There is a syntax erroneousness. check browser console .
Keepalive benefits are realized when you reuse an existing connection. The initial connection is significantly slower, but with keepalive we don ’ t have to re-establish the connection on every request. Reusing the association is a mighty and elementary manner to significantly improve performance .

Determining Keepalive Compatibility

Keepalive works differently depending on HTTP version :

Version Enabled by default? Note
HTTP/1.0 No Must set Connection: keep-alive header to enable. HTTP/1.0 is largely not used anymore.
HTTP/1.1 Yes Header is unnecessary. Most connections use HTTP/1.1
HTTP/2 Yes Header is explicitly prohibited. Usually enabling HTTP/2 is a conscious choice

In order to maintain the connection, both the customer and server must agree to keep the association open. To test whether a server observe connections alive, run curl -v and expression at the very last argumentation of output :

  • * Connection #0 to host left intact
  • * Closing connection 0

The distance of time and total of connections that a server is willing to keep open at any time may vary importantly based on server congestion, default timeout configurations, and other factors. even if you intend to keep a connection open, it can still be closed at any time by the server .

Client Keepalive Enablement

In most cases, it is only required to cache the HTTP client in club to keep the connection alive. Initialize your node globally or inwardly of an low-level formatting block preferably than inside the serve that is using the client. You should besides look at your customer ’ sulfur configuration to determine whether you can increase the time to live ( TTL ) of a be connection .
E.g. pseudocode :

client  = new client  with keepalive enabled

function mainMethod:
  client .get()

Or use faineant low-level formatting :

client  = null

function mainMethod:
   if client  is null:
    client  = new client  with keepalive enabled

  client .get()

caution :

  • Ensure that the downstream service is not maintaining any state, such as session cookies, between requests. Most APIs shouldn’t be doing this, but you should still proactively check. Make a call using curl -v to the API and check the response header for any cookies that are set.
  • Ensure that your client is not persisting authentication or headers between requests. Pass in the appropriate authentication “fresh” each invocation.

See Appendix – > Keepalive Enablement Examples for code snippets for your favored terminology to enable HTTP Keepalive .

TLSv1.3

TLSv1.3 can improve performance slightly by reducing round-trips required to establish connections. It is besides possible to pre-share keys between the node and waiter to completely eliminate TLS negotiation overhead. I haven ’ deoxythymidine monophosphate explored implementing pre-shared keys ( PSKs ), chiefly because it only provides benefit during connection establishment. If we use TLSv1.3 in conjunction with HTTP Keepalive, we merely have to incur TLS negotiation operating expense once .
Implementing TLSv1.3 is normally automatic rifle. To test a given waiter ’ sulfur documentation for TLSv1.3, run curl -v and expression in the end product for TLS messages. hera is an exercise message stating that TLSv1.2 was negotiated :

* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256

HTTP/2…?

I ’ ll admit, I expected HTTP/2 to be faster than HTTP/1.1, even for single API calls .
Google ’ s web log post states that HTTP/2 reduces reaction time and overhead. It has a lot of great features :

Feature Why it didn’t help
Push support to preemptively send content before the client asks for it Not useful for APIs; designed for webpage loads to send related content like images
Connection multiplexing APIs are typically request/response and calls are not usually made to the same server in parallel
Compression of header fields May make a difference for some cases, but didn’t matter for me

Is HTTP/2 faster than HTTP/1.1 ? Most surely. Is it faster for request/response API calls ? In most cases, no .
To discover why HTTP/2 was slower, I had to run another warhead test and dig into the network traffic using Wireshark. I used a TLS keylog dump to decrypt my traffic .

Results

I ran a one-minute load test from my car to a distant server and I ’ ve included the first ~70 calls in the chart below .
Code could not finish, this are some reasons why this find. – plot name not defined. The first parameter of the shortcode is the name. – There is a syntax mistake. check browser console table .
After the initial connection ( 100+ms ), observe that there are two tiers of response clock time : ~28-30ms and ~40-45ms. My best guess is that this is caused by network jitter. While that detect is interest, it doesn ’ triiodothyronine explain the discrepancy between HTTP/1 and HTTP/2. To understand the discrepancy, we need to dive into the mailboat capture .
Below are mailboat captures containing initial connection establishment followed by repeated consecutive requests to the server ; simulating traffic using keepalive .

HTTP/1.1

HTTP/1.1 is simple as a protocol. Each request package is followed by a individual response packet containing the headers and data .
wireshark http1.1

HTTP/2

In the test I ran, HTTP/2 is 9 % slower than HTTP/1.1 .
HTTP/2 is a more robust protocol. The foremost request to the server contains settings packets defining how many streams can be opened for multiplexing and other settings about the connection. Afterward, each request is still a single package, but the reception is broken up into two packets : a heading package and a datum packet. There is constantly a little check between those response packets, which does not exist in HTTP/1.1. sometimes, the network jitter can cause one of the reception packets to be delayed more than a borderline amount .
wireshark http2

HTTP/2 Summary

I believe, but am not 100 % sealed, that the network jitter and little amounts of rotational latency between HTTP/2 answer packets is what causes HTTP/2 to be slower for unmarried request/response calls. HTTP/2 will be significantly faster when using advanced features like push and multiplexing .

Summary

There are many different ways to improve API performance, but one of the easiest is to use HTTP Keepalive for all your clients when invoking an API. It ’ s easy to enable, has few, if any, downsides and improves performance significantly. I hope you ’ ll give it a try. Thanks for take and keep on build !
If you have any thoughts or comments, I ’ five hundred love to hear from you : contact me .
To learn more about technology careers at State Farm, or to join our team visit, https://www.statefarm.com/careers.

Appendix

Appendix 1 – Keepalive Enablement Examples

To take advantage of keepalive, you need to cache the connection object between invocations. That means always creating the exemplify of the connection library outside of the animal trainer code ( AWS Lambda ) or setting it at the class/singleton/static level .
Before enforce, please research all the options for keepalive that your HTTP library has. Some notes from my testing :

  • Some HTTP libraries send Keepalive heartbeat / probes to keep the connection alive. That works for containers that are always running, but will not work for Lambdas that are frozen when invocation ends (read more about Lambda lifecycle). Lack of packets may result in a stale connection sooner than anticipated.
  • Some HTTP libraries have timeouts for the maximum time a connection can live. In my testing, I set this to 300 seconds. I recommend you look into the configuration for your particular library.

Node.JS

AWS SDK

Version 3 of the AWS SDK enables keepalive by default .
For interpretation 2, see AWS SDK documentation. The easiest method : set the environment variable AWS_NODEJS_CONNECTION_REUSE_ENABLED=1 .

Axios

Axios is a promise-native library. Enablement of keepalive happens at the effect https library and it controlled by the keepAlive sag .

 significance {  agent }  from  'https '
 significance  'axios '

 // It is authoritative to create your exemplify outside of your lotion code const  axiosInstance  =  axios. produce({...,  httpsAgent :  modern  agent({ keepAlive :  on-key})})

 const  mainMethod  =  async () => {
  ...
   const  result  =  expect  axiosInstance. get( url)
  ...
}

Python

AWS SDK

My research says that keepalive should be enabled by default. I did not test it .

Requests Library

Keepalive is enabled by nonpayment in the requests library when you create a session. Be careful to use homeless requests across invocations .

 import requests

 # It is significant to create the request school term outside of your application code
session  = requests .session()

 def  main():
  session .get(url)

Go

AWS SDK

My research says that keepalive should be enabled by default. I did not test it .

Core net/http Package
 spell  (
   `` io ''
   `` io/ioutil ''
   `` net/http ''
)

 // It is authoritative to initialize the client outside of your application code volt-ampere  customer  * hypertext transfer protocol. customer =  & hypertext transfer protocol. customer{}

 func  main() {
   res,  stray  : =  client. Get( url)  // In golang, you have to read the body to completion and then close the body in order to reuse a connection
   if  err  ! =  nothing {
     ...
  }

	 io. copy( ioutil. discard,  res. body)  // Reading or discarding the soundbox is required for keepalive	 res. body. close()  // It is required to close the body for recycle}

Java

AWS SDK

The SDK enables keepalive by default but you can customize it .

Apache HttpClient 4.x

It ’ mho been a while since I ’ ve written Java. I used to love and use it casual, but now it is a relic of my by. It may be that there is a more efficient/clean manner to instantiate the customer that I have not discovered. Feel exempt to e-mail me if my exemplar is suboptimal .

 consequence org.apache.http.client.methods.CloseableHttpResponse ;
 import org.apache.http.client.methods.HttpGet ;
 meaning org.apache.http.impl.client.CloseableHttpClient ;
 consequence org.apache.http.impl.client.HttpClientBuilder ;
 significance org.apache.http.util.EntityUtils ;

 public  class  MyClass  {
   // Normally you would want to initialize this as a give bean   private CloseableHttpClient client ;

  MyClass ( )  {
    client  = HttpClientBuilder. create ( ). build ( ) ;  // Customize your customer if you like   }

   public  void  raise ( )  {
    CloseableHttpResponse response  = client. execute ( new HttpGet (url ) ) ;

    EntityUtils. consume (response. getEntity ( ) ) ;  // Must consume the reaction either by reading it or eating it
     // Must close the answer    response. cheeseparing ( ) ;
   }
 }

Appendix 2 – Testing Setup

In order to prove out performance improvements, I needed a solid and scalable testing solution. I wanted to be able to test out multiple approaches cursorily, and besides have a large sample distribution size so that little “ blips ” would be minimized in their impact of results .
% % { init : { “ composition ” : “ base ”, “ sequence ” : { “ fontFamily ” : “ monospace, monospace ; ” }, ” themeVariables ” : { ” primaryColor ” : “ # e4e3e3 ”, ” primaryBorderColor ” : “ # acabab ”, ” noteBkgColor ” : “ # f2ddbb ”, ” noteBorderColor ” : “ # acabab ” } } } % % graph LR Laptop ( My Laptop ) — ” Put 1000s of Messages ” — > SQS SQS — > Lambda Lambda — ” ( 1 ) hundreds of requests over VPN ” — > Linux ( Linux Server running Caddy* ) Lambda — ” ( 2 ) Store CSV of results ” — > S3 subgraph Workstation Laptop end subgraph AWS us-east-1 North Virginia SQS Lambda S3 end subgraph Data Center – Dallas Linux end

* Caddy is a laughably bare and highly performant HTTP server .
Each message in the queue contains parameters specifying how many and what kind of requests to make. The Lambda uses a binary called Hey which is a simpleton HTTP load testing utility. I contributed a couple of improvements ( 1, 2 ) to Hey which are not merged into the overlord repository yet .
ultimately, I wrote some Bash scripts ; one to add a set of messages into the queue and one to analyze the results. Each screen runs for 10 minutes to get a sufficient sample size .

source : https://www.peterswar.net
Category : Finance

Related Posts

How to Calculate Credit Card Interest Rates

interest rates are one of the ways to work out how much it will cost you to use your credit card, along with other charges and fees….

What debt collectors can & cannot do

If you are dealing with a debt collector, you have protections under the law. A debt collector must not mislead, harass, coerce or act unconscionably towards you….

Can You Afford a New Home? How to Determine Your Homebuying Budget

Can You Afford a New Home? How to Determine Your Homebuying Budget As with any major purchase, determining what you can afford before you look for a…

Why Did My Credit Score Drop?

Why Did My Credit Score Go Down When Nothing Changed? sometimes your mark does change based on factors outside of your control, but most times your behavior…

Why Do I Owe Taxes To The IRS & How To Avoid Them

Are you wondering why you owe indeed much in taxes this year ? Want to make certain you never owe a big tax bill – or any…

The 5 reasons why your credit score might suddenly drop

Select ’ s editorial team works independently to review fiscal products and write articles we think our readers will find useful. We earn a perpetration from affiliate…