Does Casablanca have an internal bandwidth limit?

Oct 26, 2015 at 12:24 AM
I've been writing a Windows console application using Casablanca, but I've been having performance issues on transfer rate. Metrics show that the console application is transferring a byte stream at roughly 6-8 Mbytes/sec.

To diagnose this, we looked at the application's file read speeds. It was getting anywhere from 20 Mbytes/sec to 100 Mbytes/sec. The variation depends on the size of the request. Read lots of small chunks, and throughput goes down. On average, file reads were at 80 Mbytes/sec.

We then tested Casablanca for local host to local host transfer with large file reads (to eliminate the variation.) We got transfer rates of about 60 Mbytes/sec. But when we tested the application between a virtual machine on the network and a physical machine, the transfer rate was at 6-8 Mbytes/sec.

Since the drop in performance occurred going from local host loop to real network, we then used curl to test a 1Gig download on the actual network. We got transfer rates of around 60 Mbytes/sec.

Here's what we think is happening:
  • Disk read isn't the bottleneck. Speed here is around 80 Mb/s or higher.
  • Local loop isn't the bottleneck. Speed here is around 60 Mb/s or higher.
  • Network isn't the bottleneck. Speed here between IIS and curl is around 60 Mb/s or higher.
  • Casablanca seems to be the bottleneck. Speed here goes from 80 Mb/s to 8 Mb/s.
We looked at the network load monitor: Between IIS and curl, the transfer had a ramp up, a flat plateau, and then a ramp down. As expected. Between Casablanca and curl, it was jagged spikes all the way, and it never really reach the higher transfer rates.

As first I thought, maybe I was using the streaming wrongly. The console application uses a producer-consumer buffer and ties an iStream and oStream to it. We do a write to it after a disk read. Debug logged output shows that the application quickly fills the REST buffer and is done with the disk read. The rest of the time is spent by Casablanca pushing out the buffer to network.

But when we tested with a code change (changed the code to not stream, but to read in the entire request into memory and then send it all at once in the reply body,) the change was negligible. Network monitoring showed Casablanca still dumped the 1Gig file in little bursts, constant spikes.

I'm wondering what we're doing wrong here. Can anyone offer any insight?
Oct 26, 2015 at 7:32 AM
A colleague of mine did some digging around in the C++ REST code, and he discovered that there is a hard-coded chunk size value.

In http_server_httpsys.cpp
#define CHUNK_SIZE 64 * 1024

When we increased the chunk size from 64Kbytes to 5Mbytes, it improved the performance dramatically. By dramatically, Casablanca's transfer speeds went from 8 Mbyte/s to 42 Mbyte/s. Of course, this now means we have to maintain a separate version of Casablanca for our purposes.

In lieu of this, I have some further questions:
  1. Are there any hidden consequences for messing around with this CHUNK_SIZE value?
  2. Are there any plans to expose CHUNK_SIZE to developers so we can tune Casablanca's performance without having to modify the API source?
By the way, Casablanca is an amazing project. Aside from this performance hiccup, it has been a joy to code with. Thumbs up to the development team.
Oct 29, 2015 at 1:01 AM
hey Wongers

Glad you were able to figure out the performance bottleneck.

To answer your questions:
  1. Are there any hidden consequences for messing around with this CHUNK_SIZE value?
    This really depends on your scenario. You can go ahead with the change if it works best for you.
    Some things I can think of:
    The same CHUNK_SIZE is used while reading a request body and while sending a response. With assigning a large chunk size, you may encounter inefficiencies if a scenario requires reading large data and sending small amounts of data or vice versa.
    We allocate the buffer for reading or sending data based on the CHUNK_SIZE. Larger chunk_size => larger buffer will be allocated at both times. See http_server_httpsys.cpp:
  2. sending a response: void windows_request_context::transmit_body() : body_data is resized based on chunk_size
  3. receiving a request: windows_request_context::read_request_body_chunk() : request_body_buf is resized based on chunk_size
    Also, the chunk_size concept is not cross platform. It is specific to windows.
Also, the chunk size is currently only configurable on windows.
  1. Are there any plans to expose CHUNK_SIZE to developers so we can tune Casablanca's performance without having to modify the API source?
  2. Currently we are not focussing on the listener side functionality. If you would like to invest some time around this, we are accepting contributions :)
Thanks
Kavya
Oct 29, 2015 at 11:37 PM
Hey Kavya,

Thanks for the response. I've replied in the GitHub issue. We'll continue any further correspondence in GitHub as you're moving Casablanca there.

Cheers,
Wongers