Parallel processing of large http_response body

Aug 18, 2015 at 5:50 AM
Edited Aug 19, 2015 at 2:43 AM
Hi all,
I just started using Casablanca in order to implement a latency sensitive REST client running on Linux.
This client is supposed to download a large chunk - ~1MB - of data ( CSV formatted) and parse it in a highly efficient manner, ideally under 1 millisecond.
First attempt was for the client to send the HTTP GET, then use http_response::read_to_end()
http_client client( ..... );
http_request loan_request = .... ;

size_t size = 10 * 1024 * 1024;
char*  buf = new char[size];
memset(buf, 0, size);
rawptr_buffer<char> buffer(buf, size);

client.request(loan_request)
.then([&](web::http::http_response response)
{
    //read response to buffer
    cout << "Response code:" << response.status_code();
    return response.body().read_to_end(buffer);
}).then([&](size_t data_size)
{
    cout << " size= " << size;
    process(); // process received data line by line
}).wait();
Spent couple of days learning about casablanca and now I'm slowly wrapping my head around continuations and PPL.
I figured that I can optimize processing of http_response::body() line by line in a parallel manner as soon as a full line arrives in the istream buffer, so I tried to do something like this:
http_client client( ..... );
http_request loan_request = .... ;

size_t size = 10 * 1024 * 1024;
char*  buf = new char[size];
rawptr_buffer<char> line_buffer(buf, size);

client.request(loan_request)
.then([&](web::http::http_response response)
{
    //read response to buffer
    cout << "Response code:" << response.status_code();

    return pplx::details::do_while([=]()
    {
        return response.body().read_line(line_buffer).then([=](const size_t bytesRead)
        {
            if(bytesRead == 0 && response.body().is_eof())
            {
                return false;
            }

             process(); //process CSV line
            return true;
        });
    });
})
.wait();
Timing both alternatives - with no actual code in process() - I discovered that my second approach is taking more time to complete.

Can someone point me in the right direction on how to spawn a new task to handle each line in the http_response.body() allowing main thread to continue fetching next line?
Or are there any ways to improve receiving and handling of a large answer from a REST WebService?
Any help would be greatly appreciated.