Let’s take a look at the various networking-specific
factors which affect application performance on the WAN. I.e. We're not attempting to cover here things such as
server capacity, or computational complexity, which are independent of “the
network”. Said another way, what are the issues which cause application performance
on a WAN to be so much worse than performance of the same application run
entirely on a LAN?
In future posts, we’ll look more at which networking and
computing techniques – Adaptive Private Networking, WAN Optimization, as well
as other techniques – address each of these factors. Before doing any kind of technology
deployment to “make the network work better”, it’s useful to understand what
the factors impacting network application performance and predictability are.
For those of you familiar with what WAN Optimization
products like those from Riverbed Technology, Blue Coat Systems and Silver Peak Systems do, a lot of the
below will be old hat; hopefully, you’ll still find some of the detail
beneficial.
I contend that WAN-specific application performance is driven entirely by 3 factors: latency, packet loss and bandwidth. While networking geeks might say “duh!”, some people familiar with WAN issues might say “huh? That’s
not right”. These people will argue that there are other factors as
well. I will show below that these other factors – which in fact are very
important – matter precisely because of the impact of loss, high latency and/or
limited bandwidth.
Definitions (from Wikipedia): “Latency is a
measure of time delay experienced in a system, the precise definition of which
depends on the system and the time being measured. Latency in a packet-switched network
is measured either one-way (the time
from the source sending a packet to the destination receiving it), or round-trip (the
one-way latency from source to destination plus the one-way latency from the
destination back to the source). Round-trip latency [a.k.a. RTT
- Round Trip Time] is more often quoted, because it can be measured
from a single point.”
"Packet loss occurs
when one or more packets of data
traveling across a computer network fail to
reach their destination.”
“Bandwidth is a
measure of available or consumed data communication resources expressed in
bit/s or multiples of it (Kbps, Mbps etc).”
Bandwidth is a pretty obvious limiting factor on transfers
of large amounts of data, of course.
While it is most definitely not the
only reason for poor application performance on the WAN, and in some cases has little
or nothing to do with application performance, just as in the LAN, having more bandwidth will make many applications run better and more predictably,
and in particular will make the network manager’s life easier.
We’ll get into the reasons for latency and packet loss
problems in WANs in just a second, but before going on, it is worth noting the three
other major factors affecting WAN application performance, which themselves are
hugely impacted by latency and packet loss.
The "bandwidth-delay product refers to the product of a data
link's capacity (in bits per
second) and its end-to-end-delay (in seconds). The result, an amount of data measured in bits (or bytes),
is equivalent to the maximum amount of data on the network circuit at any given
time, i.e. data that has been transmitted but not yet received.” The bandwidth-delay
product, which essentially has no effect on LAN performance, is a well-known
limit on how fast data transfers can occur over high delay Wide Area Networks.
A closely related issue is the manner by which TCP does congestion
control: “TCP uses a network congestion avoidance algorithm
that includes various aspects of an additive-increase-multiplicative-decrease
(AIMD) scheme, with other schemes such as slow-start in order
to achieve congestion avoidance.” TCP’s congestion control algorithm, and AIMD
in particular, is the primary reason why the Internet has not “collapsed” from
the weight of everyone using it, and is the amazingly elegant way TCP’s designers
came up with to efficiently use available bandwidth, and provide fairness “on
average”. For an individual application’s
performance, however, this means that performance suffers notably with packet
loss, and for interactive or real-time applications, can frequently have
particularly bad performance when packet loss rates exceed ~1%.
Finally, there is the issue of the “chattiness” of certain
applications or protocols. Essentially,
chattiness refers to how many multiple round trip communications – largely serialized
– between client and server are required to perform a given application
function. A fantastic explanation of
this issue can be found here in this NetForecast paper explaining Web performance
over the Internet.
Two common protocols which are very chatty are Microsoft’s CIFS
protocol, and HTTP, the dominant protocol used for web applications. Much like the bandwidth-delay product issue, “chattiness”, which doesn’t hurt performance much on a LAN, can have major consequences on a WAN
facing packet loss and/or high latency. For public Internet applications, including but not limited to web apps,
large numbers of DNS (Domain Name System) requests are a form of application
chattiness.
While there are other, innumerable application-specific factors,
it’s not much of an oversimplification to suggest that in the end, their impact
is quite similar to the “chattiness” issue noted above.
Ok, so if all network-specific performance issues can be
traced back to bandwidth, latency, and packet loss, let’s look a bit deeper
into the causes of latency and packet loss in IP WANs.
[Some sharp-eyed folks might be wondering about now “why
hasn’t this guy mentioned jitter yet??
We know jitter is a huge issue in real-time application performance”. In fact, jitter is a measure
of the variability over time of the packet latency across a network.” –
i.e. a component of latency!]
If we break down WAN latency into its constituent parts, we
see both “fixed” components and variable ones.
The fixed components of WAN latency relate to the number of route miles
a packet must travel between source and destination – and thus limited by the
speed of light – with a smaller component based on the number of routers the
packet must go through, with the small, fixed amount of time it takes to transit
the router even when the links are lightly loaded. The typical one-way latency across the
continental U.S. is ~40 ms, meaning a typical RTT across the country and back
is ~80 ms.
The variable component of WAN latency – the jitter, in other
words – is caused by queuing congestion at the routers (or other IP forwarding
devices) anywhere along the way. Queuing
congestion is caused when there is more data entering a device (router) trying
to go out a given link than the bandwidth available on that link. In typical IP WAN routers, queuing congestion
at any given router can add up to 100 - 200 ms of latency. Beyond that amount of delay, packets will
typically be dropped – causing packet loss.
Overflowing queues in forwarding devices, as just noted, are
the primary reason for packet loss. Some
routers use a technique called WRED to drop a lower percentage of packets when
their queues are beginning to fill up, to avoid excessive jitter and better
promote “fairness” across flows.
Finally, while less common than in the past, bit errors can also be a
cause of packet loss. Fairly rare on
wired networks these days (beyond the occasional flaky DSL connection), bit errors
are sometimes responsible for a moderate amount of packet loss on wireless
networks.
High latency causes obvious problems in application
performance. Packet loss usually has an
even bigger negative impact on WAN app performance. Why is this?
Given the TCP bandwidth-delay product, loss rates above ~1% mean that the
application can only use a very small amount of bandwidth on a WAN, not matter
how much is available. Over longer WAN
distances, throughput is lower still.
Because TCP is a windowed protocol, forwarding of additional packets from
the source will quickly come to a halt until the lost packet is retransmitted
and acknowledged. Even for applications where
the bandwidth-delay product per se is not an issue, the “chattiness” problem
has much the same – and often worse – effect.
In the face of packet loss, all data transmission halts until the packet
is retransmitted and acknowledged.
Whew! A lot to digest
here, and we’re really only scratching the surface. We know that we want to avoid packet loss and high latency as much as possible. In my next post, we’ll look at which
techniques address which of these factors, combating the negative effects of loss and latency directly and indirectly.