UTF-16 string is missing low surrogate

Sep 22, 2015 at 8:30 AM
Edited Sep 22, 2015 at 8:33 AM

We ran into an issue when extract_json throws an exception with content: "UTF-16 string is missing low surrogate".

Any idea what this is?

PS: We are parsing a big blob of JSON coming from a REST service, I assume something is wrong with the payload, but is there any way how Casablanca can help identifying the issue and somehow parse out whatever it can parse out (instead of dropping the entire JSON payload)?

    auto extractJsonTask = response.extract_json();
    extractJsonTask.then([func](pplx::task<json::value> jsonTask)
            auto jsonResponse = jsonTask.get(); // THIS THROWS
        catch (const std::exception& e)
            LOG_ERROR("Exception when parsing json: " << e.what())
Sep 24, 2015 at 11:38 PM
hey glukacsy

Did you get any line number or column number information along with the exception message?
Whenever possible, we are already encapsulating the line (row) number and column number data into the web::json::json_exception exception message.

Your other ask of parsing out whatever it can will require architectural changes and we will not be able to do that right now.
One workaround is to extract the data as a string and implement/use some other JSON parser.

Sep 25, 2015 at 10:59 AM
Hey Kavya,

Thanks for the answer. I forgot to mention that we got the above behaviour on OS X - is there any reason why Casablanca would do UTF-16 handling internally on OS X?
Or is it just that the incoming content is potentially mistreated as UTF-16 (when in fact its UTF-8)?

Oct 2, 2015 at 11:51 PM
Hi, we have managed to identify the offending input and it turns out to be a specific Emoji character. We tried Casablanca by building it both with its own UTF16 converter implementation and the one based on the C++ 11 wstring_convert: both cases fail to parse the incoming JSON and in both cases the whole extract_json task fails.