http_client access to data

Apr 3, 2014 at 4:12 PM
Hi,
  1. When doing a request and when the code in .this is run, does that mean that internally you got not only the header info but all the data? For example if I get a 50 MB file, is all data cached inside this class when the code in .then runs?
  2. If #1 is true, is there a direct way to access this internal storage?
Thanks,
G.
Apr 3, 2014 at 5:22 PM
Hi,
  1. Yes, after the client receives the response header, it starts to read all the body data and store them in memory.
  2. You can use "concurrency::streams::container_buffer" as the internal storeage; to avoid any extra copy, you can set this buffer stream when you create the http request.
    http_client client(U("http://yourserver"));
    http_request msg(methods::GET);

    concurrency::streams::container_buffer<std::vector<uint8_t>> buf;  //use vector as internal storage.
    msg.set_response_stream(buf.create_ostream());

    client.request(msg).then([](http_response response)
    {
        return response.content_ready(); // read all the body data.
    }).wait();

    auto result = std::move(buf.collection());
Apr 3, 2014 at 5:44 PM
Hi,

Ok that looks good. So I set the buffer to be used to get the data. In this case the class itself does NOT set any internal buffer so there is no copying around.

Thanks,

G.
Apr 3, 2014 at 5:44 PM
Hi G,

The task returned from http_client::request(...) is completed once the HTTP headers have arrived, it doesn't wait for body of the response to arrive as well. This allows for an opportunity for scenarios where streaming or dealing with large amounts of data can be handled. The task returned from the http_response::content_ready() and http_response::extract_*() methods will wait until the entire response body has arrived before completing. For example let's look at a small code snippet:
http_client client(...);
client.request(methods::GET).then([](http_response response)
{
    // This task continuation will execute once the headers have arrived, but response body
    // might NOT have completely arrived yet.
    return response.extract_string();
}).then([](utility::string_t responseBody)
{
    // This task continuation will execute once the response body has entirely arrived and been
    // processed into a string.
});
Internally as the response body is coming across the network we store it into a stream. One way you can start processing the data from the internal stream we are saving to is with the method http_response::body(). This returns an input stream that the response body data is written to. In Casablanca we also allow another option where before you send the request you can specify the underlying stream to use to store the response body. This is done with the http_request::set_response_stream(...) method. For example if you knew you wanted the response body as a string you could directly have it written to a stream backed by a string. This would give the best performance because it would avoid any unnecessary copies, writing the response body directly to your string as it arrives across the network.

Steve
Apr 3, 2014 at 6:25 PM
Hi Steve,

Ok that really works the way I was thinking it does!

You have this code:
.then([](utility::string_t responseBody)
Here is a question: string_t on Windows is a wchar_t string (16 bits chars). However the data coming from the web most likely is utf-8 encoded or plain binary (in our case). So you get the binary buffer from the web, then you convert it to utf-16? Is there a way to avoid this (using this method of cause)? Most of the downloads from a Web server in my case are binary buffers so there is no reason to go to utf-16 only to be moved back to a buffer of bytes.

The second option (http_request::set_response_stream) of cause does not have this issue.

Thanks,

G.
Apr 3, 2014 at 6:34 PM
Hi G,

Yes in most cases the data is probably coming across the network as utf-8. On Windows our string_t is always utf-16 since that integrates well with Windows. If you want to avoid this conversion then you can't use the extract_string() method. You would have to access the underlying stream directly, with http_response::body(), or use http_request::set_response_stream as you mentioned.

The set_response_stream API is always going to be the faster than extract_string, but requires you to know what you are doing. This is because even in cases where no conversion is necessary there still is going to be a copy from the stream we store the response body data in, into a string variable. There is no way for us to predict how the user is going to want to handle the response body so by default it is always just stored in a byte stream. To get an idea of what extract_string() does take a look at the source code in the method http_msg_base::extract_string().

Steve
Apr 3, 2014 at 6:57 PM
Hi Steve,

Thanks for your explanation. I think we are all settled here, I now fully understand how to use this class(yes the code in http_msg_base::extract_string() is clear!).

Here is a suggestion: maybe the class should have been made of two: a raw class focused to get back the payload as a byte stream the fastest way possible, and another one doing all the conversion you do now in the unit.

This is really great work!!

Thanks,

G.