AppNeta | The Path

Posts Tagged ‘10 Gigabit ethernet

In part one of this blog I went over some of the configuration considerations for 10 GigE server NICs.  “Performance” in that case was defined in terms of raw throughput – specifically TCP – which might be important to backup-and-recovery or a data center server.  It is pleasantly surprising to discover that the right choice of settings can reliably get close to 99% utilization of the link.

Your mileage may vary (YYMV), of course.  And some NICs, servers and OSs (I’m not naming any… right now) are not quite up to the job…

However, for many of you raw throughput fails to impress.  That is, your links may be far from saturated; instead, you are most intent on gaining the latency advantages that 10G offers.

Why is 10G better for latency?  In a word, serialization – the time it takes to process a packet and send it out on the wire (or read it back) is reduced by roughly a factor of 10 over 1G.  Propagation time (rate of the signal travel, whether through optical fibre or copper) stays pretty much the same.  So you don’t strictly gain anything over a given distance.  But the time for a packet of a certain size to be transmitted to the wire or received at each interface is drastically reduced.

Theoretically, the serialization time for a 1500 byte packet is around 12 microseconds at 1G and 1.2 microseconds at 10G.  This is about how long it takes a given interface to read from or write the packet to the wire.  However there are plenty of other sources of latency in the end-to-end path (particularly at the end-hosts which can and should be the focus of tuning).

To give you a feel for the realities – in my experience, a 1500 byte packet that is “ping-ponged” between two decently tuned servers on a 1G LAN network will make its round trip in around 120-180 micro-seconds.  This is from application layer to application layer.  Roughly one-third of that time is due to serialization, when there is a switch involved, with 4 different interfaces requiring reads/writes 8 times in a ping-pong (with pairs of read/writes happening near simultaneously).  Moving to 10G will free up most of that and round trip time will typically be shortened to around 70-80 micro-seconds.

Again, that is measured at the application layer – RTTs further down in a well tuned stack can be 30-40 micro-seconds.  Once again YMMV – and I’d be interested to hear what the latency-obsessed consider to be acceptable performance in a round-trip time.

And who are these latency-obsessed types?  In the fast and furious world of financial transactions for example, latency is king.   Well, rather, latency is a princess – and all work to serve her delicate sensibilities.  Enormous amounts of money are spent shaving a few milliseconds off of response times – because even more enormous amounts of money are at stake.

In that world, some of the configurations that help optimize for raw throughput are poison for fast packets.  In particular, interrupt coalescence – typical 10G NICs default to 75-100 micro-seconds of wait time between receiving a packet and triggering an interrupt.  When many packets arrive at once, this limits the number of interrupts sent and reduces the load on the CPU.  But waiting that long is an anathema for low-latency.  Turning off coalescence, or at least reducing the wait times to their minimums, is a must.

Other more refined tuning may include binding interrupts and network-bound applications to the same CPU to make the packet processing even more efficient.  And very careful choices with regard to buffer sizes and allocation strategies can help as well.  This gets well into the black arts of low-latency tuning.  Beyond that, you may start to ask yourself if Ethernet is really your best choice for low-latency inter-connects.

The so-called “wizard gap,” between what can be attained out-of-the-box and what a network wizard can achieve, keeps getting bigger.  Expect to tune your 10G-based systems if you really want the gains 10 Gigabit Ethernet can provide.  But that is why they pay you the big bucks, right?

I am sure that question keeps you awake at night.  For some of us, getting the most from 10G-connected hosts is a hot item on the agenda.  And, it is not entirely clear yet what we can reasonably expect.

Back when 1G was just becoming available for end-hosts, two things seemed certain: the NICs were expensive and filling the pipe was highly unlikely.  Certainly 1G was faster than Fast Ethernet – you could easily get 300-400 Mbps – which was a definite improvement.  However, in general the end-hosts were not able to put enough packets on the wire to use the full capacity.  Often it was the CPU or the size of the bus that was the capacity limiter.  Sometimes it was simply that the drivers were not sufficiently mature.  

Now that 1 Gigabit is mainstream, everybody and their laptop has a 1G NIC – probably a Broadcom.  They generally work well and, with a bit of tweaking, they can typically fill an end-to-end 1G path.

With 10G though, we are back where we were with 1G some years ago.  Many things are the same – but many things are different too…

For starters, a number of performance optimization mechanisms have become quite common-place in the typical 10G NIC.  Interrupt coalescence is a good example.  When a packet arrives, it is held briefly in case others follow right after.  Once enough packets have arrived, or enough time has passed, a single interrupt is generated – instead of one for each packet.  This reduces the load on the CPU and bus in cases where very large flows might otherwise be generating storms of interrupts, one for each packet.

Another example is segmentation offload.  It also helps decrease the load on the CPU by transferring large amounts of outbound data straight to the NIC where it is subsequently broken down into chunks of the appropriate size to be sent as packets.  Typically the CPU has been responsible for this segmentation – so offloading it to the NIC make the transmission more efficient overall.  This mechanism is sometimes referred to as “large segmentation offload” (LSO), or when applied specifically to the TCP stack as “TCP segmentation offload” (TSO), or “generic segmentation offload” (GSO) when used for all IP packets.

While LSO and TSO are quite common in 10G NICs, “large receive offload” or LRO is less typical, although it is starting to be offered as well.

Jumbo packets (or rather frames) are quite often part of the 10G picture too.  “Jumbo” refers to the maximum transmission unit (MTU) size when it is larger than the Ethernet and Fast Ethernet standard maximums of 1500 bytes.  1G and 10G do not have a standard maximum – so different NICs have different maxima, ranging from 8000-16,000 in size.  As a convention though, 9000 bytes is often used when a network is designed for jumbo packets.  The benefit of jumbo is that there are fewer IP packets to handle as their payload is so much larger.  This reduces the stress on the NIC and on mid-path devices that inspect IP headers.  The requirement for jumbo use is that the MTU has to be at least that large along the entire end-to-end path.

Jumbo packets have been around since 1G – but they haven’t been as broadly used simply because it requires that all the mid-path networks support them as well.   In addition, today most 1G NICs work at full capacity using just the smaller 1500 byte packets.  So 1G never really took advantage of the extra benefit of jumbo.

10G on the other hand needs all the help it can get.  And since the prospects of jumbo for 1G have been around for while, network engineers are ready to work with the larger packets.  It is doubtful that there exists a 10G interface anywhere in the typical 10G mid-path that cannot support jumbo.  So jumbo will very likely be a consideration in a 10G end-to-end path.

Finally, with all of these mechanisms and features in place, the NIC and driver need to be implemented with extra-large buffers to hold all of the data that is being operated on.   It is not unusual to find that default settings are much too low for efficient 10G operation.  So those have to be built out as well.  Not complicated but not something to overlook.

Properly implemented, it is quite reasonable to see 98% of capacity in a one-way TCP flow between two 10G end-hosts (assuming LAN and zero loss).  Duplex operation (flows going in both directions simultaneously) may not see full capacity in each direction simply due to the limitations of NIC design – this is one of the most severe tests of a NIC’s performance – however 70-75% of two-way capacity is relatively easy to achieve.

But all of this assumes that what you care about most is bulk data throughput.  For some of us, network performance is measured in fractional reductions in overall packet latencies.  

More on the low latency implementations of 10G to come…

For more information about end-to-end network visibility visit


The marriage of 10 Gigabit Ethernet and virtualization seems a matter of amazingly good timing. 10G did not seem to be offering much new to the networking world apart from being the same as 1G only faster. The urgency of increasing capacity for the most part just wasn’t there – at least not in the LAN or the server.

Oh sure, there will always be applications or parts of the network screaming out for more bandwidth, such as backup systems, database servers, and aggregation points in core networks. But most LANs, services and end hosts seemed to have been sated for the time being by 1G and aggregate 1G connections – at least for now and for the foreseeable future. It seemed like 10G would remain a niche capability for a while longer.

This assessment is supported by the fact that 10G has been well ahead of the server technologies; very few machines have been capable of pushing out enough bytes per second to use up even a fraction of that capacity. There are always internal bottlenecks that limit.  This is exactly what happened in the early years of 1G. Few machines could push out more than 300-400 Mpbs, mostly due to driver inefficiencies, limited CPU and slow busses. Nowadays, most solid workstations can get pretty close to 950 Mbps with the right applications pushing out the bytes.

So there wouldn’t be much point in arming most machines with 10G, at least for several years to come. And the number of really high-end systems using them would be few and far between. Or so it seemed.

With virtualization taking off like Mentos in a glass of Coke, it suddenly became apparent that the broad-based need for 10G would become a reality much sooner then anticipated. It will still take the hardier servers to fill the pipe – but more and more services are being pulled off smaller individual servers and pushed into virtual machines.  This rush to virtualized consolidation means that there are many more really big machines acting as virtual hosts that can ably fill 10G and more services running out through 10G network interfaces. More switches with 10G ports are following suit. And suddenly the pull for 10G is much higher than it would have been otherwise.

Why does all that matter so much? Well, think of virtualization as a kind of accelerant for networking. For example consider the recent proclamation by folks at Network World that 10G has caused significant shift in network design from three-tier to two-tier. They explicitly reference the influence of virtualization on the impact of 10G networks.

All this indicates that, instead of a long, drawn-out transition to 10G over the next decade, we can expect to see prices come down, performance increase, and capacities all through the network shoot up over a relatively short period of time. Well, except maybe at my house. But that’s another story. 40G and 100G may not find as much traction available when they arrive – but it can be assured that Ethernet will continue to dominate networking thanks to this happy convergence of supply and demand.

Follow us on Twitter!

Error: Twitter did not respond. Please wait a few minutes and refresh this page.

%d bloggers like this: