I suspect most folks are not very familiar with the overhead of the TCP protocol - and would probably be astonished at the impact that even apparently minor differences can make in the limitations on actual bandwidth availability and utilization. A simple difference such as 10ms vs 50 ms in latency can make a huge difference.
Some day you may be in a situation where the monitoring metrics across your application tiers (such as CPU and memory utilization) are within well accepted ranges - but response times may seem quite excessive in a number of cases. One possible area to explore is your available theoretical network bandwidth vs actual network bandwidth - and what percentage of the actual available is being utilized. Here are a few possible areas to examine:
- Is there an inefficient application in your data center that is behaving in an excessively chatty way (e.g. sending thousands of requests to complete the display of a single page)?
- How is the setting for tcp window scaling option configured?
- What is the profile of transactions across your network? Is there something that is using an excessive % of your available network bandwidth - that might be a candidate for refactoring - or somehow isolating from the web application tier?
- What is the latency between your application tiers / servers / external third party services?
- Are all of the devices in your network using a consistently sized Network Interface Controller adapter that supports the target maximum theoretical speed of your fastest NIC device?
- Are your customer facing web applications on the same network segment as your heavy-lifting back-end systems?
- Will your customer facing web applications benefit by partitioning some lower-priority / high data volume / large data packet consuming applications into one or more separate network segments?
- If you upgraded your network to use 10 Gigabit Ethernet, did you also upgarde the cabling from cat 5 to cat 6 (or at least cat 5e) - to mitigate interference - which can impact packet loss.?
- Is your network cabling sufficiently separated by some distance from your power cabling - to mitigate interference - which can impact packet loss?
This posting is a placeholder for some of my background reading and research...
Some basic background material...
An excellent list curated by staff at Stanford University...
Examples of some tools to provide basic types of operational monitoring capabilities you should have in place...
Some help with basic calculations...
Some links to possible tools and vendor specific resources...
- MRTG graphs to display bandwidth usage. MRTG (Multi Router Traffic Grapher)