extract_json performance

Jun 4, 2015 at 9:30 PM
Hi Steve,

I was measuring performance for transferring fairly large amount of text data (~ 4.3 MB).
It was relatively fast to get data to client at 1.3 seconds (in LAN environment). However, the call
http_response.extract_json().get() is taking larger time at around 5 seconds. What I observed is get() is taking longer time. Any possibility to get the result faster (less than a sec)?

Jun 5, 2015 at 12:19 AM
Hi Ganesha,

To give some background information, the task returned from http_client::request(...) is completed once the HTTP headers have arrived. This does NOT mean that the entire HTTP response body has arrived, the body could still be in transit in chunks. The tasks returned from http_response::content_ready() or any of the http_response::extract_* functions will wait until the entire message body has arrived. It is possible the download is taking up some of the time waiting on the task from extract_json(). Parsing the JSON text could also be taking some of the time.

Internally all HTTP message bodies are stored in a stream, by default a stream backed by a producer_consumer_buffer. We read data from the socket in chunks at a time, by default the chunk size we use is 64k. You can try adjusting the chunk size to minimize the amount of read operations performed, if you know the size of your dataset. For example you could try setting it to 1MB to avoid repeated reads. You can set the chunk size to use with the http_client_config::set_chunksize(...) API.

The next improvement you could try is to avoid unnecessary copying performed writing and reading from the internal default producer consumer stream buffer we use. If you know the exact size and content of your incoming data you can perform further optimizations. For example if you just want to get the HTTP response body as a std::string and you know it is coming across as UTF-8 you can create a stream buffer backed by a std::string. Additionally you can perform all the memory heap allocations up front if you know the size. To quickly summarize this would involve steps like the following:
  1. Create a container_buffer<std::string>
  2. Reserve the capacity to allocate all the memory up front using container_buffer::collection().reserve(size)
  3. Set the container buffer to be used to write the HTTP response body into before sending the HTTP request with http_request::set_response_stream(...)
  4. After sending the HTTP request, the body you can be signaled that the body has entirely arrived with the task returned from http_response::content_ready()
  5. Access the underlying std::string from the container buffer by moving out of the buffer with something like std::string str = std::move(buffer.collection());
Lots of information here I know, but I thought I'd try to share and you can decide how much effort you want to invest to improve the performance :)

Jun 8, 2015 at 12:26 PM
Hi Steve,

I tried the following 2 ways:
  1. http_request req;
    Concurrency::streams::ostream ostr;

    pplx::task<http_response> response = myhttp_client->request(req);

    .then([&](http_response response) 
    return response.content_ready();
    .then([&strResult_out](http_response response)
    strResult_out = response.extract_json().get().serialize().c_str(); 
    }) .wait();
  2.      concurrency::streams::container_buffer<std::string> buffer;
        concurrency::task<size_t> aSize = response.body().read_to_end(buffer);
      // check the content after aSize.is_done()
What I observed is, in both cases the time to get the data completely remains more or less same (6 sec).
For the test, I used http_listener on server side. The entire data of 4.3 MB was cached in server memory before the http request came from client. This was just to get clear picture on the pure transmission time.

At the moment, we are using ATL Server for communication between client and server. What I observed is that, with the currently ATL Server I could transfer the 'same' data in around 2 seconds!

Do you see anything which I have missed or doing wrong?

Jun 8, 2015 at 5:34 PM
Hi Steve,

Some more observation:
I defined a 'progress_handler' to see what's happening in the background. Just logged the size in the callback.
The pattern is as below:
resthtttpclient.cpp; 89; Dir : upload, Size = 812 Delta = 812
resthtttpclient.cpp; 89; Dir : Download, Size = 8192 Delta = 7380
resthtttpclient.cpp; 89; Dir : Download, Size = 14600 Delta = 6408
resthtttpclient.cpp; 89; Dir : Download, Size = 22792 Delta = 8192
resthtttpclient.cpp; 89; Dir : Download, Size = 27740 Delta = 4948
resthtttpclient.cpp; 89; Dir : Download, Size = 35040 Delta = 7300
resthtttpclient.cpp; 89; Dir : Download, Size = 43232 Delta = 8192
resthtttpclient.cpp; 89; Dir : Download, Size = 45260 Delta = 2028
resthtttpclient.cpp; 89; Dir : Download, Size = 53452 Delta = 8192
resthtttpclient.cpp; 89; Dir : Download, Size = 61644 Delta = 8192
Total time to get 4.3 MB data is 7.9 sec

When I defined the chunk size as 64 K (using set_chunksize), the pattern is:
resthtttpclient.cpp; 89; Dir : upload, Size = 812 Delta = 812
resthtttpclient.cpp; 89; Dir : Download, Size = 65536 Delta = 64724
resthtttpclient.cpp; 89; Dir : Download, Size = 131072 Delta = 65536
resthtttpclient.cpp; 89; Dir : Download, Size = 196608 Delta = 65536
resthtttpclient.cpp; 89; Dir : Download, Size = 262144 Delta = 65536
resthtttpclient.cpp; 89; Dir : Download, Size = 327680 Delta = 65536
resthtttpclient.cpp; 89; Dir : Download, Size = 393216 Delta = 65536
resthtttpclient.cpp; 89; Dir : Download, Size = 458752 Delta = 65536
Total time is 8.6 secs

The network usage has been pretty low during this http response (less than 1% in 1 GBPS network).

Jun 8, 2015 at 6:58 PM
Hi Ganesha,

The default chunksize is already 64K. Also I assume you are running with the release build, correct?

In your prior examples you don't have everything quite right, if you want to use the set_response_stream try something like the following:
web::http::client::http_client_config config;
config.set_chunksize(1024 * 1024); // 1 MB chunks
web::http::client::http_client client(U("http://www.bing.com"), config);

// Write the response directly to a std::string backed buffer.
// Reserve all the memory needed upfront in one heap allocation.
concurrency::streams::container_buffer<std::string> buffer;
web::http::http_request request(web::http::methods::GET);

// Send request, please note this is blocking synchronously here.
// The task returned from http_response::content_ready() indicates
// when the entire response body has arrived.
web::http::http_response response = client.request(request).get();

// Move the string out of the buffer to avoid a copy.
std::string responseBody = std::move(buffer.collection());
Also what platform are you running on? In some cases we've performed more optimizations for efficiency.

Jun 8, 2015 at 8:06 PM
Hi Steve,

Even with this code, the performance has remained same (~9 sec).
I have Windows Server 2012 where http_listener is running. http_client is running on Windows 7. (And there is no virus scanner running on these test systems.)

Any possibility that http_listener of the REST SDK has some issue?

Jun 8, 2015 at 8:09 PM
Hi Ganesha,

Potentially, the focus of this library is really on the client side connecting portions. We have done very little performance work on the http_listener and don't have any current plans to. What kind of performance do you get if you remove the http_listener and just use your existing ATL server with our http_client?

Jun 8, 2015 at 8:18 PM
Hi Steve,

ATL Server is an old technology from Microsoft which uses SOAP for communication (supported until VS2005). This is not compatible with http_client unfortunately.

Have you done any performance analysis of http_client with WCF as the http listener?

Is there any possibility that the focus on http_listener in REST SDK will never be picked up in future and will it get dropped out from the future plan?

Jun 8, 2015 at 8:22 PM
Hi Ganesha,

Regarding the http_listener - we are not actively doing any work on it right now and don't currently have any plans to. It is missing some important features and hasn't received the same amount of work as other parts of the library. This is the reason it is marked as beta and in an experimental namespace.

Jun 9, 2015 at 4:04 PM
Hi Steve,

I did 2 things today.
  • Used 2.6.0 of REST SDK instead of 2.5.0. There seems to be some fix in 2.6. for http_listener.
  • Used a different client system
    The download of data is now very fast (between 1 to 1.5 sec)! So this is certainly a good news.:-)
Now could you please elaborate a bit on "It is missing some important features" in your previous message.

Jun 9, 2015 at 5:28 PM
Hi Ganesha,

A few examples of major features missing include, authentication, HTTPS (only implemented on Windows), and persistent connections (only implemented on Windows).

Jun 9, 2015 at 6:41 PM
Hi Steve,

I am interested only in Windows deployment. So do you mean the features in http_listener are almost complete for Windows deployment? Anything is missing for Windows?

Jun 9, 2015 at 6:53 PM
On all the platforms I consider the http_listener to still be a beta state. Depending on what features you want or need it might be complete enough, there is no authentication support, no client or server certificate support, very few if any configuration options. You can take a look at the outstanding issues, by clicking the "Issues" tab and selecting the http_listener component. From a quality perspective we didn't do any scale testing, reliability testing (since a server component), very little performance work has been done.

You always could implement and contribute back any features if you wanted to as well.

Jun 9, 2015 at 7:20 PM
Thanks Steve. Let me check other aspects that we need.