Wednesday, December 11

Geek

The 50/80 Rule

When a modern server shows that it is running at 50% capacity, it is actually running at 80% or more, and is effectively if not absolutely saturated.

The reason for this is recent* changes in server CPU architecture from both Intel and AMD.  Current Intel CPUs are almost all hyper-threaded (each core provides two hardware threads); current AMD CPUs are mostly based on a module architecture, where each module contains two cores with some shared resources.**  While to software, a chip will appear to have 2N available cores, the available performance is only about 1.2N.

Chips from both companies also feature turbo modes, so that if only one or two cores on a multi-core chip are busy, it will accelerate, often by 25% or more.  As more cores become busy and power and thermal load rises, clock speeds scale back automatically toward a baseline.

Unfortunately, operating system schedulers are currently smarter than our monitoring tools.  Both Linux and Windows know that programs will perform better if you assign each new task to its own hardware core/module rather than just to a logical thread, and the operating system does its best to do so.  But monitoring tools are still at the level of logical threads.  At 50%, all your independent cores/modules have been allocated by the scheduler, and all you have is the extra 20% or so you can squeeze from the chip by allocating the additional shared-resource threads.

The result of which is that if you look at a server, and it is running at around 10% load, then the maximum that server can handle is less than 5x that (even before we take queueing theory and response times into account; sustainable average load is probably only 2x).  And a server running at 50% is flat out and the engines cannae take any more.

Cf. the 80/20 Rule (a.k.a. the Pareto principle): 20% of X accounts for 80% of Y.  For example, 20% of your customers will account for 80% of your revenue, and 20% of your customers will account for 80% of your work - and it's not the same 20%.

Also cf. the 80/80 Rule: The first 80% of a project will take the first 80% of the time, and the last 20% will take the other 80% of the time.

* Recent meaning the last few years.

** While this doesn't inherently contain the same limitations as hyper-threading, in the Bulldozer and Piledriver implementations, it effectively does; the bottleneck seems to be the shared instruction decoder.  The performance boost going from one busy core in each module to two is of the same order as the boost from Intel's hyper-threading.

Posted by: Pixy Misa at 09:37 AM | Comments (3) | Add Comment | Trackbacks (Suck)
Post contains 446 words, total size 3 kb.

1 If I read you right this only applies to HT, right?  So my home desktop core i5 (for those who don't know, mobile i5s mostly are dual-core HT, but desktop i5s  are quad-core non-HT), this doesn't apply (or our older servers at work, which are non-HT Xeons?

Posted by: RickC at Friday, December 13 2013 03:41 AM (swpgw)

2 Yes, the problem is mostly due to HT (and AMD's Bulldozer bottleneck, which works out the same).  A Core i5 will have a smaller effect from turbo mode.

Older non-HT Xeons, and pre-Bulldozer Opterons (6100, and older 2000 and 8000 series chips) don't have turbo mode, so they don't have this problem at all.

Posted by: Pixy Misa at Friday, December 13 2013 04:54 AM (PiXy!)

3 "Turbo Mode?"  That's for people who didn't buy a K-model and overclock, right? smile

Posted by: RickC at Monday, December 16 2013 04:00 AM (swpgw)

Hide Comments | Add Comment

Comments are disabled. Post is locked.
48kb generated in CPU 0.1682, elapsed 3.4288 seconds.
56 queries taking 3.4011 seconds, 340 records returned.
Powered by Minx 1.1.6c-pink.