Friday, August 07, 2009

A Service Bottleneck


When I was looking at time series data from my previous post, I found an interesting issue. First, the above graph represents "connected time," i.e. time people are actually using the resource; else, the value at Zero would dwarf the entire graph. The Y-axis is the number of seconds served for the corresponding number of concurrent connections. This is actual data, and the describe is generic.

The coolest part is: IT'S A BELL CURVE. I've read about these in statistics, but this is the first time I've encountered one in the wild. I knew I had a problem when the bell curve did not flatten out on the right tail. The physical limit of the resource is 23 units. As concurrent connections approached 23 units, new connections were denied and subsequently retried. See the "hump" beginning at 18 units? That's the graphical representation of a bottleneck.

Now, how to fix this. . .it's not practical to fix this problem with a technological solution. 95% of connected time is served prior to the bottleneck at 18 units. Over 90% is served with the first 13 units, and the marginal return for each subsequent unit is less valuable.

The fix is with a business control. First we determined the build up occurred due to multiple long running connections, which did not act like the other connections. Many of these connections began at the same time and finished at the same time, and did not disconnect randomly. Therefore, we modified the behavior of the people using the long running connections.


No comments: