convert_byes_to_wstring function issue

Oct 29, 2013 at 6:10 PM
Hi,

I used this function on some tests and it looks to me that if the input buffer has an odd # of bytes, the last one is lost. Actually the code should crash but it does not because the string resize method will reserve at least one extra byte. Here is the function for reference
    utf16string convert_bytes_to_wstring(const unsigned char *src, size_t src_size)
    {
        // If there was some sort of way we could do a move or create a string
        // from a sequence of bytes we would save a lot of copies, but there isn't.
        utf16string result;
#ifdef _MS_WINDOWS
        msl::utilities::SafeInt<size_t> sz(src_size);
#else
        size_t sz(src_size);
#endif
        if(sz != 0)
        {
            result.resize(sz / 2);
            memcpy(&result[0], src, sz);
        }
        return result;
    }
so suppose that 'sz' = 13. In this case the size of the dest wide string will be 6 wchar which is 12 chars. The memcpy copies 13 chars but it is still ok because of the extra space in the wstring buffer. However the size of dest string is 6 wide chars (12 bytes).

From the name of this function, I undestand that this function takes a number of bytes and convert them to a string. The memcpy will just dump the bytes in a new structure of type wstring but it is NOT a proper string?! Maybe the name should me something like covert_bytes_to_wbuffer, or something like that? One thing is clear, what comes out of that function is not a wstring, it is a buffer of wchar but 'seen' as an array of bytes.

GT.
Coordinator
Oct 31, 2013 at 1:48 AM
Hi GT,

Yes you are absolutely right this is a bug. I'm fixing it right now, it will be addressed in our next release. Thank you for taking the time to report the issue to us.

Just as an FYI any API like this under a 'details' namespace we freely could change between releases. It might not be wise to take a dependency unless you are prepared to change your code if changes occur in the future.

Thanks,
Steve
Coordinator
Oct 31, 2013 at 2:08 AM
Hi GT,

Upon taking a further look at this code and purpose of this function I actually believe everything here is correct. The point of this function is to take raw bytes containing a string of 2 byte characters and create a std::wstring out of it. The src_size parameter is the number of bytes, perhaps the parameter should be renamed to be more clear. The input will always be even. We could put in a check to make sure and throw an exception, but if this function is being called with a src_size value that is odd then it is a program error.

Steve
Nov 1, 2013 at 4:16 PM
Hi Steve,

Yes you are correct, I was taking the function out of context but, just reading the code, is difficult to see that the function will always be called with an even # of bytes.

Thanks

GT.