Ambient Irony

Sunday, June 13

The Decline and Fall of the Silicon Empire

[I wrote most of this last weekend, but didn't post it then because it clearly needs an edit. I don't know when that's going to happen, though, so I decided that I'd post it anyway. This is a blog, after all, not Communications of the ACM — Pixy]

I've written recently on the untimely death of Moore's Law and on one of the first side-effects of the faltering and failure of that law. But, being somewhat dead myself, I didn't have the time or energy to go into any detail, and probably left my less-geeky readers saying something along the lines of Huh?

But this is important, so I'm going to give it another try.

Way back in 1965, just four years after the first integrated circuit was built, Gordon Moore, then working at Fairchild, made an observation and a prediction.

His observation was that the number of components in an integrated circuit was increasing, while the cost of each component was decreasing; his prediction was that this trend would continue. Intel has made his original paper available for you to read. It's a little bit complicated; Moore is talking about trends in the number of elements in a integrated circuit required to achieve the minimum cost per component - efficiencies of scale, in other words.

Reduced cost is one of the big attractions of integrated
electronics, and the cost advantage continues to increase as
the technology evolves toward the production of larger and
larger circuit functions on a single semiconductor substrate.
For simple circuits, the cost per component is nearly inversely
proportional to the number of components, the result of the
equivalent piece of semiconductor in the equivalent package
containing more components. But as components are added,
decreased yields more than compensate for the increased
complexity, tending to raise the cost per component. Thus
there is a minimum cost at any given time in the evolution of
the technology. At present, it is reached when 50 components
are used per circuit. But the minimum is rising rapidly
while the entire cost curve is falling (see graph below). If we
look ahead five years, a plot of costs suggests that the minimum
cost per component might be expected in circuits with
about 1,000 components per circuit (providing such circuit
functions can be produced in moderate quantities.) In 1970,
the manufacturing cost per component can be expected to be
only a tenth of the present cost.
The complexity for minimum component costs has increased
at a rate of roughly a factor of two per year (see
graph on next page). Certainly over the short term this rate
can be expected to continue, if not to increase. Over the
longer term, the rate of increase is a bit more uncertain, although
there is no reason to believe it will not remain nearly
constant for at least 10 years. That means by 1975, the number
of components per integrated circuit for minimum cost
will be 65,000.
I believe that such a large circuit can be built on a single
wafer.

What he's saying is that by 1975, it would be cheaper to build a single integrated circuit with 65,000 components than to build two 32,500-component circuits - and, by comparison, a 130,000-component circuit (if such a thing could be built) would cost more than twice as much.

Events since then have proved him right (and happily he is still around to enjoy it). And more right than he imagined because not only have the components been getting smaller and cheaper, but at the same time they have been getting faster and using less power. And this has been going on, following a curve where (to take the most widely noted example) processing power has been doubling every 18 months. For my entire life processing power has been doubling roughly every 18 months.

My first computer, which I bought as a teenager, saving pocket money every week until the day of the Big! Christmas! Sale! was a Tandy (Radio Shack to many) Colour Computer. It had 16k of ROM (which contained the BASIC interpreter; there was no operating system as such) and 16k of RAM. It was powered by a Motorola 6809 processor and a 6847 video chip. It had a maximum resolution of 256 by 192 - in black and white - or 16 lines of 32 columns in text mode.

It ran at 895kHz.

Yes, boys and girls, kiloherz. It was an 8 bit chip (with a few 16-bit tricks up its sleeve, admittedly); it could execute, at most, one instruction each cycle, and it ran at less than a megahertz. (Also, it had no disk drives at all; everything was stored on cassette tape, which fact is directly responsible for the irretrievable loss of my version of Star Trek and the completely original game Cheese Mites.)

Not quite twenty years on, I'm typing this on a system with a 2.6 gigahertz 32-bit processor than can execute as many as three instructions per cycle, some of which can perform multiple operations like doing 4 16-bit multiply-accumulates all at once. It has more level-one cache than my Colour Computer had total memory. Its front-side bus is eight times as wide and nearly a thousand times as fast. My display is running at 1792 by 1344 in glorious 24-bit colour. And it has six hundred and fifty gigabytes of disk.*

It cost a bit more, it's true. My 1984 Colour Computer cost me $199.95, and Kei, my 2003 Windows XP box, cost me around $2000. The best I can do today for $199.95 (ignoring for the moment two decades of inflation and the fact that this now represents a morning's earnings rather than a year's) is a Nintendo Gamecube. The Gamecube only runs at 485MHz (achieving a measly 1125 MIPS); it only has 40MB of memory; it only has 1.5GB of storage. Its peak floating-point performance is a mere 10.5 GFLOPS, compared to the Colour Computer's... I don't know, exactly, since the CoCo had no floating-point hardware at all, and I doubt that the software emulation achieved so much as 10.5 kiloFLOPS.

So, depending on exactly what you wish to measure, 20 years of innovation has given us somewhere between a thousand and a million times better value for money.

And here it is again: This has been going on for my entire life. Every year, tick tick tick, new and better and faster and cheaper. You buy the latest and greatest and it's obsolete before you get home from the mall. It's so much a part of our lives that it's a joke, a cliche.

And it just died. [That last link goes to an IBM presentation, the first 13 pages of which are just general marketing material, but pages 14 to 24 go right to the heart of the problem.]

The death of Moore's Law has been predicted many times, not least by Moore himself, but when you get IBM's Chief Technology Officer saying

Scaling is already dead but nobody noticed it had stopped breathing and its lips had turned blue.

you know something's up. Particularly when he's not making a prediction, but talking about what's happening right now.

And everything was planned so neatly too. 90 nanometres was to come on line late '03, ramping up this year; 65 nanometres was to be the big thing of '05, followed by 45 nanometres in '07. Now, beyond that, at 30 nanometres and 20 nanometres, things were less clear, and beyond 20 nanometres not clear at all, but at least the path was marked out from the old 130 nanometre stuff down to 45, giving us 9 times the transistors and 3 times the speed. Only someone forgot to check with the laws of physics.

Wired: How long will Moore's Law hold?
Moore:
It'll go for at least a few more generations of technology. Then, in about a decade, we're going to see a distinct slowing in the rate at which the doubling occurs. I haven't tried to estimate what the rate will be, but it might be half as fast - three years instead of eighteen months.
What will cause the slowdown?
We're running into a barrier that we've run up against several times before: the limits of optical lithography. We use light to print the patterns of circuits, and we're reaching a point where the wavelengths are getting into a range where you can't build lenses anymore. You have to switch to something like X rays.

So, what exactly is the problem? It's not, as Moore and others predicted, a question of actually building the circuits - that's still working fine. IBM, Intel, AMD and others have all produced working chips at 90 nanometres. The problem is leakage. Each of the millions of transistors in a chip is a tiny switch, turning on and off and incredible speeds. Each time you turn the transistor on, or off, you need to use a little bit of electricity to do so. That's okay, and it's expected, because you don't get anything for free. The problem is that the transistors are now so small, and the layers of insulation - the dielectric - so thin, that they leak. There's a partial short-circuit, and so instead of only using power when the switch switches, it's using power all the time.

So what? Electricity is cheap. Well, the so what is heat. Modern microprocessors use as much electricity as a light bulb, and that means they produce just as much heat. If they didn't have huge heat sinks and fans bolted onto them, they'd very quickly overheat and fail - a fact that some people have inadvertantly discovered.

Until now, each new generation of scaling, each new node, has brought smaller, faster, cheaper and cooler transistors. At 90 nanometres, transistors are smaller, cheaper, probably faster again - but they run hotter. And the competition in the processor market has already driven power consumption (and heat generation) about as high as it can go. So when the new generation was discovered to increase the heat rather than decrease it, the whole forty-year process of accelerating change ran head-first into a wall.

Back at the end of 2002, I made the following set of predictions for the coming year. I felt pretty comfortable in all of them, the first no less than any of the others:

My predictions for 2003:
1. Microprocessors will hit 4GHz by the end of the year. Marketers will try and largely fail to convince the public to buy them.
2. A major scientific breakthrough will lead to a new and deeper understanding of something.
3. A major political scandal will result in a huge media kerfuffle and only die down when someone resigns.
4. There will be a war.
5. Bad weather will affect the lives of millions of people.
6. There will not be any major, civilisation-destroying meteor impacts.
7. Astronomers will find new and interesting things in the sky.
8. Spam, pop-ups and viruses will continue to plague us. The Internet will fail to collapse under the strain. Pundits will predict that this will now happen in 2004.
9. A rocket will explode either on the launch pad or early in its flight, destroying its expensive payload - which will turn out to be uninsured.
10. Cod populations in European waters will continue to fall, and the European parliament will fail to act to prevent this.
11. A new species of mammal will be discovered.
12. A species of reptile or amphibian will be reported as extinct.

But not only did we not see 4GHz processors in 2003, it's doubtful that we'll see them in 2004 either. (I was wrong about number 3, too. No-one resigned, and the media moved onto the next scandal. Rinse, repeat.)

Now, assuming you're not a hard-core computer gamer, hanging out for the release of Doom 3 and Half-Life 2, why should you care?

Well if you have broadband internet, or a mobile phone, or a DVD player, or a PDA, or a notebook computer, or a digital camera (or a digital video camera), or you use GPS on your camping trips, or you enjoy the low cost of long-distance phone calls these days, if you download anime or the latest episode of Angel off the net, if take your iPod with you everywhere you go, if your job or your hobby involves using e-mail or looking things up on the Web, you can thank Moore's Law for it.

Modern communications depend critically on advanced signal processing techniques, performed by specialised chips called Digital Signal Processors, or DSPs. These things are everywhere - every modem, every mobile or cordless phone, every digital camera, every TV or VCR or DVD player, every stereo, every disk drive. It's the relentless advance of Moore's Law that has made DSPs fast enough and cheap enough to do all this, and made them efficient enough to run on batteries so well that your mobile phone might last a week between charging. (My first mobile was lucky to make it through the day.) Disk drives demand high-speed DSPs to sort out the signals coming from the magnetic patterns on the disk and turn them back into the original data. DVD players need them to turn the tiny pits pressed into the aluminium surface into a picture. The entire global telephone network, mobile and fixed, depends on DSPs. And any advances in any of these areas will require more and faster and cheaper DSPs and - uh-oh.

And there's more: The advances in computers and communications over the past four decades have been the primary driver of the global economy. The economy has been growing all that time, even though we have made no fundamental breakthroughs in finding new resources or new materials. If you're better off than your parents, you can thank Moore's Law for a big chunk of that - if not the effort you put in, then the new opportunities it opened up.

And it just died.

I don't think the financial markets have a clue yet what's going on, but in any case it's going to be a soft landing. All of the processor manufacturers have been in a mad rush over the last decade to produce faster chips at the expense of pretty much anything else. The funny thing is that they've been pushing so hard, they've left a lot of things behind. Take a look at this chart:

int fp base base 1076 763 Pentium M 1.6GHz 805 635 Pentium M 1.1GHz 237 148 C3 1.0GHz (C5XL) 398 239 Celeron 1.2GHz (FSB100) 543 481 Athlon XP Barton 1.1GHz (FSB100 DDR) 581 513 Athlon XP Thoroughbred-B 1.35GHz (FSB100 DDR) 1040 909 Athlon XP 3200+ (Barton 2.2GHz, FSB200 DDR) 1276 1382 Pentium 4 3.0E GHz Prescott (FSB800), numbers from spec.org 1329 1349 Pentium 4 3.2E GHz Prescott (FSB800) 560 585 Athlon 64 3200+ 0.8GHz 1MB L2 1257 1146 Athlon 64 3200+ 2GHz 1MB L2

You don't have to understand exactly what this means, but the first number relates to "integer" performance, which is important for things like word processing and web browsing and databases, and the second number relates to "floating-point" performance, which is important for games. (Well, and other things too.)

The Pentium M is a modified version of the Pentium III, customised for notebook computers. Since notebook computers run off batteries, and batteries don't hold much power at all, the Pentium M has been tweaked to provide as much speed as possible while using as little power as possible. The Pentium 4, on the other hand, is designed for speed at the expense of everything else. And what we find is that the 3.2GHz Pentium 4, despite having twice the clock speed of the 1.6GHz Pentium M, is just 25% faster on integer (useful work) and 75% faster on floating point (games).

And - here's the tricky bit, and the cause of Intel's recent and dramatic change in direction - the Pentium 4 uses four times as much power as the Pentium M. So if, instead of putting one Pentium 4 onto a chip, you put four Pentium Ms, it would use the same amount of power and produce the same amount of heat, but it would run up to three times as fast... Overall.

Which is great and wonderfuly if you can use four processors at once. I can, quite happily, and more than that. A word processor can't, not easily, but then word processors already run pretty well. Games, and other graphics-intensive stuff like Photoshop or 3D animation software certainly can, though most games haven't been written to do so. Not yet.

But they will. That's the next paradigm shift for programming, by necessity: Everything will be multithreaded. And it won't stop at two threads, or four. AMD has just announced the new Geode NX. It's a 1GHz processor that runs on just 6 watts of power, around a tenth as much the power-hungry monsters inside today's high-end desktops... Which run at around 3GHz, and would be stomped into the dirt, aggregate-performance-wise, by a chip with ten Geode NX cores on it.**

Apart from more cores, we can also expect cores that do more in one cycle. We've already started to see this with Intel's MMX and SSE, Motorola's Altivec and AMD's 3DNow, all of which are designed to take a 64-bit or 128-bit register and use it to perform multiple 8-bit, 16-bit, or 32-bit operations in one go.

The advantage of these instructions is that many DSP algorithms for video and audio applications - like MP3 files, or DVD video - only require 8 or 16 bit values, but modern processors are designed with 64-bit registers for doing floating-point arithmetic. If you can subdivide that register and do eight 8-bit calculations at once, you can get through the work eight times as fast - or you can run at one eighth the clock speeed, and use a fraction of the power.

The Intrinsity Fastmath LP, like the Geode NX, runs at 1GHz and draws 6 watts of power. Unlike the Geode, it is a single-issue in-order core, which makes it smaller and simpler, but also slower.

On the other hand, it has a 4x4 matrix of 32-bit arithmetic units, each of which can hold two 16-bit elements. It can perform 16 billion multiply-and-add operations (the core of many DSP algorithms) per second - which puts it equal to a dual 2GHz G5 Macintosh. (And Intrinsity have a 2.5GHz version of the Fastmath too, only it uses more than 6 watts.)

I have a Sony Vaio mini-notebook; it has a 733MHz Transmeta Crusoe processor. It's kind of slow, and when I try to play the opening sequence of Jungle wa Itsumo Hale nochi Guu on it, it pretty much freezes up. That could be fixed by using a big, power-hungry Pentium 4 processor (like my desktop), but then I'd have a battery life of about five minutes. If instead Transmeta included a matrix processing unit like Intrinsity's - and someone wrote a video codec that used it - I could watch the whole video without dropping frames, and without being tethered to by desk by a power cord.

The chip for Sony's upcoming Playstation 3, known as the Cell, takes this even further - judging from the patent applications, anyway; little technical information has been released. It has four cores on the chip; four effectively independent processors. Each of these cours has eight vector units attached to it, and each of those vector units is capable of processing 128 bits at a time - four 32-bit calculations, or 8 16-bit ones, or 16 8-bit ones. And you can be pretty sure that it can multiply-and-add in one go. So in a single cycle, it can perform as many as 16 times 8 times 4 times 2 = 1024 operations.

Which is rather a lot.

What's more, it's called the Cell because it's designed to be hooked up to other Cells in large networks, all working together. Which should make Dead or Alive 5 visually impressive, to say the least.

There's an article on lithography in April's Scientific American, and it plots the trend for CPU speeds forwards as far as 2020... Assuming that the trend continues as it has. Unfortunately, that doesn't seem like it will happen any more, and we won't be seeing 50GHz processors after all, at least not from conventional silicon chips.

Which is bad news for the people making those conventional silicon chips. But it's good news for designers of unusual devices like the Fastmath. And it's good news for programmers, because all those single-threaded applications are going to have to be re-written.

One of the regulars on the newsgroup comp.arch noted some time ago that even if Moore's Law failed tomorrow, we'd still have a factor of ten in performance improvements up our sleeve, because today's processors are designed to make it easy for programs to run fairly quickly, rather than to simply deliver the maximum theoretical performance. It's a trade-off, and it's been the right choice until now.

And now it's time to roll up our sleeves.

* That's dedicated disk; we'll set aside the terabyte or so living in the file server.
** And the Geode NX is a full Athlon core too, so you're not losing anything: It's still packed with 3-issue out-of-order-execution goodness.

Posted by: Pixy Misa at 08:12 AM | Comments (8) | Add Comment | Trackbacks (Suck)
Post contains 3494 words, total size 22 kb.

1 So the practical effects of Moore's Law are still in effect. We're still going to be getting faster exponentially. The 18 months timetable might be broken but the rest still holds true, right?

Posted by: Jim at Sunday, June 13 2004 09:46 AM (saeHM)

2 Yes and no. Transistors are still getting smaller, but if the leakage continues to get worse, we won't be able to use them. No point building a billion-transistor CPU if it chews up a kilowatt of power even when it's idle. This is less of a problem for memory - so far - but is a critical issue right now for CPUs. Hence all the scrambling about. Any exponential growth curve has to fail sooner or later, and I expect the new curve to fail a lot quicker than the previous one. However this post doesn't address advances such as electronics based on carbon nanotubes - I don't know enough about that to comment. Or at least, not enough to comment usefully. :)

Posted by: Pixy Misa at Sunday, June 13 2004 09:53 AM (+S1Ft)

3 But with multi-processing the effective realized speed for the user will still be jumping. The speed increase will be coming from a different source but it'll still be there. Until that curve taps out anyway.

Posted by: Jim at Sunday, June 13 2004 10:01 AM (saeHM)

4 Right. But there are two problems with this: First up, there's only so much you can do with multiple processors. I have an application at the office that produces PDFs of our customers' bills so that the staff in the call center can view them on their screens. It has to take the Postscript files from the billing system (which are updated daily) and turn them into PDFs using Ghostscript. It takes about a second to convert the average bill, and there are many thousands of bills that need to be processed every day. If I didn't have a multi-processor machine to run it on, it would take all day. But, since every bill can be converted independently, and since the conversion process uses little disk or memory bandwidth, I could happily use a hundred processors all at once to get the job done a hundred times faster. That sort of thing is known as an embarrassingly parallelisable problem, but unfortunately, EPPs are the exception rather than the rule. Most tasks are much harder to speed up this way. It's not at all clear how to use multiple processors to make Microsoft Word run faster, for example. Which brings us to our second point. Until now, every couple of years the Lithography Fairies have handed the Chip Designers a free upgrade. Improvements in semiconducter fabrication automatically made everything faster, better, cheaper. And now instead of faster and better and cheaper, you get to pick at most one of those. The rest of the slack has to be taken up by the chip designers, and the compiler writers, and the application programmers. What this means is that we no longer have a rising tide lifting all boats. Every chip has to be redesigned, and compilers written to take advantage of the new features, and applications rewritten to be multi-threaded and to use the new vector and matrix units and other specialised hardware. So no more coasting along enjoying the ride; it's now a long hard slog towards the future. (See, I said it needed an edit.)

Posted by: Pixy Misa at Sunday, June 13 2004 10:17 AM (+S1Ft)

5 Ah, got it now. :)

Posted by: Jim at Sunday, June 13 2004 11:02 PM (saeHM)

6 Oh and you were right about point 3 (http://news.bbc.co.uk/1/hi/uk_politics/3441945.stm) Now granted clock speed acceleration may be dropping off but I would suggest that for the time being we can afford to take a break. There is very little that my 2.4ghz Pentium 4 won't do instantly within my requirements as a power user. I'd rather hope that we concentrate the same level of research and drive on improving Comms speeds. I'd quite like to be able to watch video on demand thanks ... and a streaming music system would be dinky too!

Posted by: Rob at Tuesday, June 15 2004 05:33 AM (kXZI6)

7 Oh and you were right about point 3 Heh. Kind of backwards, but it still works :) My P4 2.6 (nyah, 200 more megahertz!) is pretty good, at least until I try to do video editing. The matrix math unit from the Intrisity chip is the ideal solution for that. As for comms - ADSL2+ supports speeds of 20+Mbps for downloads, and the chips are already in production. (Now we just have to wait for someone to roll it out.) That and a 108Mbps wireless network should have most home users covered. (Personally I want VDSL, but I'm funny that way.)

Posted by: Pixy Misa at Tuesday, June 15 2004 05:54 AM (+S1Ft)

8 Right, so I guess we aren't at the Star Trek style computer point yet, but I am prepared to bet that someone, somewhere, is already on the fringe of a breakthrough which will take us through this barrier and on to the next stage of development. Its also a certainty that someone is already at work on a parallel development which could just possibly be the breakthrough after that. I have always liked the maxim "Don't tell me it's impossible - prove it!" It is in proving someone elses theory/postulation is impossible that many of the greatest advances have been made.

Posted by: The Gray Monk at Wednesday, June 16 2004 07:58 AM (U5kQV)

Hide Comments | Add Comment

Comments are disabled. Post is locked.

74kb generated in CPU 0.018, elapsed 0.1247 seconds.
56 queries taking 0.1129 seconds, 366 records returned.
Powered by Minx 1.1.6c-pink.

Sunday, June 13

Praise for Ambient Irony

Contact Support

Contact Pixy

Business News

Search Thingy

Recent Comments

Topics

Monthly Traffic

Content

Categories

Archives

A Fine Selection of Aldebaran Liqueurs

That Ol' Janx Spirit

Mostly Harmless

MuNu Blogroll

Dish of the Day

Feeds