streambuf and stringstreambuf

Apr 6, 2013 at 9:17 AM
Edited Apr 8, 2013 at 1:17 AM
I am having issue with streambuf and stringstreambuf because they have no support of clearing the buffer.

I am using stringstreambuf for http_response::get_line. I could use streambuf but I didn't find a way to convert it to string with stringstreambuf::collection() I can.

So the way I can properly get what I need is to use string::erase() in this pattern:
read = response.get_line(data).get()
str = data.collection()
str.erase(str.begin()+read, str.end())
I find this kind of code odd and probably slow. If I can simply clear "data" (but how; and not reallocating each time unless buffer is too small) it would make the code cleaner.
Apr 8, 2013 at 4:59 PM
c2c,

In order to answer your question in more depth, it would help to have a slightly bigger view of what you are trying to do. It's hard to say how to best deal with this from those three lines of code.

Niklas
Apr 8, 2013 at 5:15 PM
It's related to my other post. So I am handling a connections that sends me data over time (could be an hour) until it reaches content-length. It response doesn't come all at once. So my logic to handle the above 3 lines of code is to use a while(!done) loop. it's ok to block because this is a listening connection running in a background thread launched from std::async().

Obviously for this loop to be efficient I want to re-use the same stringstreambuf (or streambuf) for reading the body (get_line()). In my mind I would like to be able to clear this buffer (set allocated memory to 0; otherwise future bytes will merge with past bytes; as is happening right now; I simply move write pointer to beginning of buffer). This way when I convert the stream to a std::string (or wstring) it can be processed cleanly. If each line can directly be read to a std:string that would work also for me.

Server sends data like this (as an example; and each line has new line char):

OK
data1
data2
data2
data4
...
END

Thanks,
Apr 8, 2013 at 5:32 PM
c2c,

It seems to me, not knowing enough about your application, that if you are not hyper-sensitive about scalability, and your first paragraph above seems to suggest that you are not, or you would avoid blocking APIs like .get(), the cost of creating a new stream buffer each time through the loop shouldn't be prohibitively high.

If you are hyper-sensitive about scalability, then storing data in STL strings or vectors is out of the question, period, since they progressively reallocate the underlying storage and copy data. In that case, using a raw-pointer buffer may be the way to go -- there should be no allocations that you don't see, but you have to size it large enough to hold the longest line. With that solution, just create the underlying storage once, then create the stream buffer from that storage each time through the loop, a super-cheap operation. The downside of that is that you have to manage the lifetime of the underlying storage yourself.

Niklas
Apr 8, 2013 at 10:01 PM
Can you give me an example of how I can use raw pointer (I am assuming you are talking about char* or wchar_t*?) for get_line()? in std::streambuf has a method str() which can be used to reset the buffer to a new string but it reallocates the memory. Also how is streambuf converted into a std::string? I am really struggling with streambuf/stringstreambuf. Seriously I am spending more time on figuring out stream types than writing real code.

I think get_line() or other such function should have an easier way for the client to use different string type (ex: wchar_t*)so it can be by default made efficient. Right now it seems everything is for one time use. If you have a class that will be re-used many times but calling each of these function will cause buffer re-allocation it mean by design it's inefficient as a choice. It would be better if by design you offer the more efficient choice because nobody will use these methods once in the life spawn of the app. If allocation is already efficient (ie: only reallocating if buffer is too small) than it should provide an efficient way to clear the buffer for better re-use as well (I need an example for my above question foremost).

I hope this didn't sound like a rant;)
Apr 8, 2013 at 10:58 PM
Edited Apr 8, 2013 at 11:04 PM
c2c,

The use of raw pointers with asynchronous interfaces is generally very dangerous, as the likelihood that you will pass a pointer to something that doesn't have long enough lifetime is too high.

That's why we have limited the use of raw pointers in the stream interfaces to two areas:
  1. The low-level streambuf APIs.
  2. Indirection on streams via the 'rawptr_stream<T>' buffer, which is found in rawptrstream.h. It takes a pointer and a size in its constructor, and you can pass any pointer into it, as long as its lifetime exceeds that of all operations on the stream buffer. An example would be:
char buf[512];
memset(buf, 0, sizeof(buf));
streams::rawptr_buffer<char> block(buf, sizeof(buf)-1);
response.read_line(block).get();
Of course, with this strategy, you wind up with a C string (null-terminated), not a C++ string. If performance trumps all other concerns, and you know the size of the longest line, that's the way to go.

If, as you originally said, code cleanliness is your main concern, I would suggest:
while (condition)
{
    stringstreambuf data;
    auto read = response.read_line(data).get();
    process(data.collection());
}
This re-allocates space for the string each time through the loop, but looks clean.

Alternatively,
stringstreambuf data;
while (condition)
{
    auto read = response.read_line(data).get();
    process(data.collection());
    data.resize(0);
}
Which should, at least if I correctly understand our implementation of std::basic_string<T>, not de-allocate the space used by the string from iteration to iteration. It will, however, leak memory if there's a line early on in the sequence that is substantially longer than the rest. Of course, if the loop terminates, the leakage is temporary.

Niklas
Apr 9, 2013 at 4:08 PM
Great insight!

I like the idea of resize(0) but it doesn't exist in streams::stringstreambuf (or streambuf). Is set_buffer_size() something equivalent? I tried set_buffer_size(0, std::ios_base_out), It doesn't clear extra data (so going from 10 to 5 char in length doesn't set extra data to 0). So we are back to the original solution but at least we exhausted other alternatives.
stringstreambuf data;
string ss;
size_t read;

while(!done)
{
    read = response.read_line(data).get();
    ss = data.collection();                       //is this a copy or realloc or move?
    str.erase(str.begin()+read, str.end()); //similar to what resize(0) does when cleaning extra space
    data.seekpos(0, std::ios_base::out);   //make sure new data is written from the beginning again
}
It would be nice to have more helper methods (clear() or resize()) to make manipulating response body easier.

Thanks,
Apr 9, 2013 at 4:35 PM
Sorry, that should have been
data.collection().resize(0);
The resize is on the underlying string, not the stream buffer.

Niklas
Apr 9, 2013 at 7:31 PM
The issue now is that after resize(0) buffer pointer (position) is not correct (it should set to beginning of stream). Also I am getting assertion error in debug in m_size <= m_data.size() in resize_for_write() when get_line() is called after resize(0)

Issue ex:

get_line(data) // m_data = "AB"
resize(0); // m_data = "\0\0"
get_line(data) // m_data = "\0\0AB" <== it's appending to the last write position

Now one might think just seekpos(0, std::ios_base::out), or do I need to use std::ios_base::in? but that doesn't work it will eventually say something out of bound string iterator + offset out of range). You can ignore all the warnings but eventually it will exit due to debug checks. Now I am assuming debug version has more checks but is this indication that something is wrong?
Coordinator
Apr 9, 2013 at 8:19 PM
Hi c2c,

We were actually incorrect in informing you about doing the data.collection().resize(0). You can't make modifications to the underlying container in the container_buffer. The intent of the collection() function is to simply allow you to have access to the underlying container to copy or move it out of the container_buffer to use somewhere else. If you make modifications like clear or resize then the container_buffer can NOT be reused in the get_line operation again. This is not a supported scenario right now.

For now you will have to stick with the pattern of re-creating the container_buffer (stringstreambuf) each time in the loop or use the rawptr_buffer. I think from reading the code you have posted it would look something like the following. I hope this helps.
string ss;
size_t read;

while(!done)
{
    stringstreambuf data;
    read = response.read_line(data).get();
    ss = std::move(data.collection()); // Note the move here.
    
    // Perform your operations on the string ss for each line here...
}
We will discuss this issue with the team and see what other options we could do in the future.
Thanks,
Steve
Apr 9, 2013 at 9:07 PM
I strongly recommend a method called clear() (that does something like ZeroMemory). It sounds useless but it's useful for reusing the stream.

Thanks again for great support.
Apr 10, 2013 at 5:52 AM
Yes, making that second loop alternative that I proposed actually work seems worthwhile.

If we consider something like this:
stringstreambuf data;
while (condition)
{
    auto read = response.read_line(data).get();
    process(std::move(data.collection()));
    data.clear();
}
A consequence of moving the string when picking it up for processing is that new memory would have to be allocated each time through the loop, just as if the buffer had been declared inside the loop. This is because the target string takes ownership of the internal storage. In order to realize any allocation benefit, you would have to take care to not move the string for processing, but always share it using a reference, which is OK when processing things synchronously, but becomes more precarious when writing asynchronous code.

Niklas