Timeout error when downloading large files in Unix

Jun 27, 2014 at 12:04 AM
Edited Jun 27, 2014 at 12:06 AM
We have a very large file (~600MB) to download. While the download operation works perfectly and consistently well in Windows, it has a nasty habit of timing out on iOS/OSX with the error "Failed to read response body" (located in the Casablanca file http_linux.cpp) - or at least acting like it's timing out. Several aspects of its behavior are rather maddening:
  • The input file stream is very visibly receiving data during the entire 30-second window leading to the timeout. In fact, if you specifically suppress the timeout error from killing the calling task, the stream will continue to receive data and stop properly when the full file has been downloaded.
  • The problem is somewhat consistent, but not completely. About one time out of ten or twenty, the timeout error will not be thrown at all, and the download operation will succeed in full (which takes about 2 minutes on a MacBook Pro, and longer on an iPad).
We have attempted to fix this by:
  • Increasing the timeout significantly (works - at least at our LAN speed - but obviously not an acceptable solution)
  • Decreasing the chunk size within the client_config object (no effect)
  • Switching between HTTPS and HTTP endpoints (no effect)
  • Replacing the entirety of the affected calling code in our project with the code in the official Casablanca http client tutorial (https://casablanca.codeplex.com/wikipage?title=Http%20Client%20Tutorial) (no effect)
The unfortunately inconsistent nature of the bug leads us to believe that the problem lies somewhere in the interaction between the timer thread and the request thread within http_linux.cpp, but after a few hours of combing through this code manually, we have no real idea where that interaction is even supposed to take place (though admittedly we are not terribly familiar with Boost). As far as we can tell, the timeout operation is only supposed to trigger if the linux_request_context shared pointer goes out of scope, and we can find no place between request creation and the end of response streaming where that would happen.

We are using Boost 1.55.0 and Casablana 2.1.

Any thoughts?
Jun 27, 2014 at 7:34 PM
FYI, I submitted a pull request with a change to http_linux.cpp that will reset the timeout timer when data is received. The changes fix this issue.
Jun 27, 2014 at 11:39 PM
Great work, thanks. We'll look at inclusion of this pull request into the next release.

Thanks again for finding this issue and fixing it!