Sunday, July 31
A database is a record storage system that provides a generalised mechanism for locating any given item among N items in less than O(N) time.
First CorollaryAny record storage system that does not provide such a mechanism, regardless of what other capabilities it might exhibit, is not a database.
Second CorollaryAny record storage system that does not provide such a mechanism will eventually fail due to unforseen queries that require O(N) time.
* From Hal Draper's seminal but little-known 1961 paper on information science.
Tuesday, July 26
"She really is a crazy busy bee."
Monday, July 25
Maya the Bee was anime?
I always thought it was a European series. But then, I was watching towards the end of my don't-care-so-long-as-it's-animated period, so I wouldn't have noticed either way.
The American dub had a different - and lame - theme song. The version we got in Australia had the original tune* with translated lyrics, which is apparently the case for most of the rest of the world:
Czechoslovakia, as was
German remix, apparently broadcast in Spain (the Spanish theme is less lame than the American version, but still not the real thing)
Polish polka version, sung by an unreasonably talented 11-year-old
A live Czech version (Karel Gott sang the original German, Czech, and Slovak versions)
A German / Czech co-production
Wait, Spain had the real thing too, as well as their German remix and their Latin alternate-universe version
Another German version on Spanish TV or... something - this one actually changes languages midstream
Honestly, you could hum the tune of this show to anyone in the world between the ages of about thirty and fifty and form an instant friendship... Except for America. Oh, and Italy. They messed it up too.
The odd thing, though, is that English version is not the song from my childhood. That version went:
There is a land that you can't see,
Although it sometimes isn't there;
And that is where you'll find a bee,
With so much happiness to share.
And if you ask her for her name she'll say it's Maya,
The one and only little bee called Maya,
Maya has so many friends you see;
She really is a crazy busy bee.
She's always going to exciting places, Maya,
Meeting friends with different faces, Maya,
Maya, everyone loves Maya.
Maya (Maya), Maya (Maya),
Maya tell us about your day.
I got most of that from memory, and then found this page which had all but the opening verse, and in the right order. (As soon as I saw it I realised I had some lines the wrong way round.) I found the opening verse on some random page on Facebook.
Now, let's see if I can find that one as well.
Oh well, here's another version in... Don't know any more. I think that's Croatian. (Looks up.) Yes, Croatian, which I speak fluently... To the extent that I can recognise Pcelica Maja when I see it twice just a few minutes apart.
* And by "original", I mean the version composed by Czech songwriter Karel Svoboda. The Japanese theme song is nearly as bad as the American version.
(Cute Overload < Zooborns < Koala Land)
Thursday, July 21
Also, the new Mac Mini Lion Server. A teeny-tiny quad-core i7 running MacOS X Server? Sold! It has everything except for tons of storage, and Synology will deal with that. I've been looking at the Mac Mini for years, and this is easily the best config they've ever offered.
Wednesday, July 20
So I had a free half hour when things weren't actually on fire, and I watched the first episode of this. I still have the latter half of last season's shows to watch, but I wanted to pick up one episode of something, and Usagi Drop was something that sounded like it might be good.
And it is. It is very good indeed.
Oh. What's it about? It's about a thirty-year-old guy who moves in with his aunt, who scolds him all the time. Yes. Yes, that's what it's about. Heh.
Four plummeting bunnies out of four.*
* Just as a reminder, the scale is out of four, and the scores range from -1 to 5.
Tuesday, July 19
The other end of the spectrum from clueless junior programmers who tell you that your database can't scale to the level you already have it at, is highly paid and experienced consultants who tell you the exact same thing.
Sunday, July 17
It's been a while since I bought myself a new toy.0 My current pair of laptops from last year (Mio and Sae) are doing fine and don't need replacing. My current Windows desktop (which dates back to 2008) needs to be rebuilt, but I already have the parts for that and just haven't found the time to do the necessary work. My Linux box is pretty much okay, though it's full of backup files because I have nowhere else to put them. More memory would be nice when I'm playing around with large databases or working with lots of virtual machines at once, but to get there I'd have to replace the motherboard as well, because while the current board supports 16GB of DDR2, up from the 8GB actually installed, it would be cheaper to swap the board for a DDR3 model than to track down the rare and expensive unbuffered 4GB DDR2 DIMMs I need.
Which ain't no fun.
A new video card is always nice, but my 4850 is doing pretty well, and I've finished Mass Effect 2 and Dragon Age: Origins, so there's nothing taxing I really need to play until Mass Effect 3 comes out next year. And with AMD's 7000 series based on a 28nm process expected around the end of the year, now is not the time to buy.
I even have an SSD - an 80GB Intel X25-M - sitting here that I haven't had time to install.
What I really need is products that are reasonably priced and simply do what it says on the box. Buy them, plug them in, leave them to work.
I've had my eye on some cheap Buffalo Linkstation Pro Quads - a cute little 4-bay NAS, about 6x6x9 inches, which goes for less than $250 locally without disks. I can pick up 2TB Seagate or Western Digital drives1 for less than $90 each, so it's about $600 for a 6TB RAID-5 NAS. Pretty good; my LaCie 8TB RAID-5 unit ran something like $2000 back when. But I'd need more than one of the Buffaloes, and that's kind of a bore, even if they're small and cute and I can name them after the fairies from Sugar.2
I regretted not getting more of the old Acer Easystores when those were going cheap (the discontinued Linux model, not the later Windows Home Server model), so I've been thinking of maybe getting three or even four of the Buffaloes.
Then I saw this:
And I though to myself, wait a minute. Wait a minute. 12 bays? Nearly 200MB per second?4 InfiniBand5 expansion for a second cabinet and another 12 bays? That's got to cost a fortune.
Well, in fact it's not cheap, but it's not nearly as expensive as I'd expected. $1550 for a 12-bay high-performance (for the market segment) SMB6 NAS is a steal. It's not rack mount, but I don't have a rack. The expansion cabinet isn't much cheaper than the main unit itself, making that a dubious proposition, but it's there if you need it.
As well as the basic SMB,7 NFS, and FTP, it's an iSCSI target (that is, it can serve as raw disk as well as shared filesystems), a web server with MySQL and PHP, a recording station for TCP/IP-based video cameras, a streaming audio server, an automated BitTorrent/eMule/Usenet downloader, an iTunes server, a print server (it has four USB ports for attaching widgets), a mail server, a firewall, and a VPN server.
Or to put it another way, it runs Linux. But it has a pretty front-end.
It does all the usual things you'd expect storage-wise: RAID-0, 1, 5, 6, and 10, online capacity expansion and RAID level migration, and it has a hybrid RAID mode (like the Drobo) that lets you mix and match drive sizes and automatically balances them with single or double failure protection and gives you the optimum disk space.
It's a fair bit bigger than the Buffaloes - about a twelve-inch cube, so about four times the size and six times the price for the empty box. But I can pop a dozen cheap 2TB drives in there, RAID-6 them, and then forget about it. That's precisely what I need.
There's an even more powerful model, the DX3611xs, with four network ports, two InfiniBand ports for expansion, and two expansion slots besides, but that costs more than twice as much and still only has 12 bays, so it's not what I need. Might be just the ticket for the office, though. And this is where the expansion units make sense - with a much more capable but concomittently exorbitant main unit, you'd want the ability to add as much storage as possible, and the DX1211 expansion runs a little over a third the price of the DS3611xs.
Update: With the latest version of the management software (currently in beta) it's also an ISO server (serving CD/DVD/Blu-Ray images as network drives), a syslog server (for collecting logging data from multiple Linux or Unix systems), an Apple TimeMachine backup server, a scan and fax server, an LDAP server, and a Youtube video snaffler, among other new features. It's really quite shiny.
0 The Steam and GOG sales don't count.
1 I already have a small herd of external 2TB Western Digital drives here that I've been using for backups for my dying Windows box. Wonder if they come out of their casings easily?
2 The LaCie is in fact named Sugar. It's white3, and it's a cube, so that was obvious.
3 Well, it looked white in the photos. It's not white. It's an unpainted cast aluminium block. I named it Sugar anyway.
4 Link aggregation.
6 Small-Medium Business.
7 Server Message Block, a.k.a. CIFS.
Saturday, July 16
A guest post, by, well, me, from seven years ago, with added commentary by me from today.
I've written recently on the untimely death of Moore's Law and on one of the first side-effects of the faltering and failure of that law. But, being somewhat dead myself, I didn't have the time or energy to go into any detail, and probably left my less-geeky readers saying something along the lines of Huh?
But this is important, so I'm going to give it another try.
Way back in 1965, just four years after the first integrated circuit was built, Gordon Moore, then working at Fairchild, made an observation and a prediction.
His observation was that the number of components in an integrated circuit was increasing, while the cost of each component was decreasing; his prediction was that this trend would continue. Intel has made his original paper available for you to read. It's a little bit complicated; Moore is talking about trends in the number of elements in a integrated circuit required to achieve the minimum cost per component - efficiencies of scale, in other words.
Reduced cost is one of the big attractions of integratedWhat he's saying is that by 1975, it would be cheaper to build a single integrated circuit with 65,000 components than to build two 32,500-component circuits - and, by comparison, a 130,000-component circuit (if such a thing could be built) would cost more than twice as much.
electronics, and the cost advantage continues to increase as the technology evolves toward the production of larger and larger circuit functions on a single semiconductor substrate.
For simple circuits, the cost per component is nearly inversely proportional to the number of components, the result of the equivalent piece of semiconductor in the equivalent package containing more components. But as components are added, decreased yields more than compensate for the increased complexity, tending to raise the cost per component.
Thus there is a minimum cost at any given time in the evolution of the technology. At present, it is reached when 50 components are used per circuit. But the minimum is rising rapidly while the entire cost curve is falling (see graph below). If we look ahead five years, a plot of costs suggests that the minimum cost per component might be expected in circuits with about 1,000 components per circuit (providing such circuit functions can be produced in moderate quantities.) In 1970, the manufacturing cost per component can be expected to be only a tenth of the present cost.
The complexity for minimum component costs has increased at a rate of roughly a factor of two per year (see graph on next page). Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000.
I believe that such a large circuit can be built on a single wafer.
Events since then have proved him right (and happily he is still around to enjoy it). [Me'11: And still is today.] And more right than he imagined because not only have the components been getting smaller and cheaper, but at the same time they have been getting faster and using less power. And this has been going on, following a curve where (to take the most famouse example) processing power has been doubling every 18 months. For my entire life processing power has been doubling roughly every 18 months.
My first computer, which I bought as a teenager, saving pocket money every week until the day of the Big! Christmas! Sale! was a Tandy (Radio Shack to many) Colour Computer. It had 16k of ROM (which contained the BASIC interpreter; there was no operating system as such) and 16k of RAM. It was powered by a Motorola 6809 processor and a 6847 video chip. It had a maximum resolution of 256 by 192 - in black and white - or 16 lines of 32 columns in text mode.
It ran at 895kHz.
Yes, boys and girls, kiloherz. It was an 8 bit chip (with a few 16-bit tricks up its sleeve, admittedly); it could execute, at most, one instruction each cycle, and it ran at less than a megahertz. (Also, it had no disk drives at all; everything was stored on cassette tape, which fact is directly responsible for the irretrievable loss of my version of Star Trek and the completely original game Cheese Mites.)
Not quite twenty years on, I'm typing this on a system with a 2.6 gigahertz 32-bit processor than can execute as many as three instructions per cycle, some of which can perform multiple operations like doing 4 16-bit multiply-accumulates all at once. It has more level-one cache than my Colour Computer had total memory. Its front-side bus is eight times as wide and nearly a thousand times as fast. My display is running at 1792 by 1344 in glorious 24-bit colour. [That display sadly died not long after; it just got brighter and brighter until it burned out. Yes, a CRT.] And it has six hundred and fifty gigabytes of disk.* [Today you could buy that for less than $30.]
It cost a bit more, it's true. My 1984 Colour Computer cost me $149.95, and Kei, my 2003 Windows XP box, cost me around $2000. The best I can do today for $149.95 (ignoring for the moment two decades of inflation and the fact that this now represents a morning's earnings rather than a year's) is a Nintendo Gamecube. [Remember those?] The Gamecube only runs at 485MHz (achieving a measly 1125 MIPS); it only has 40MB of memory; it only has 1.5GB of storage. Its peak floating-point performance is a mere 10.5 GFLOPS, compared to the Colour Computer's... I don't know, exactly, since the CoCo had no floating-point hardware at all, and I doubt that the software emulation achieved so much as 10.5 kiloFLOPS.
So, depending on exactly what you wish to measure, 20 years of innovation has given us somewhere between a thousand and a million times better value for money.
And here it is again: This has been going on for my entire life. Every year, tick tick tick, new and better and faster and cheaper. You buy the latest and greatest and it's obsolete before you get home from the mall. It's so much a part of our lives that it's a joke, a cliche.
Classical MOSFET scaling ENDED at the 130nm node (and nobody noticed)- almost the exact same sentiment expressed by IBM.]
The death of Moore's Law has been predicted many times, not least by Moore himself, but when you get IBM's Chief Technology Officer saying
Scaling is already dead but nobody noticed it had stopped breathing and its lips had turned blue.you know something's up. Particularly when he's not making a prediction, but talking about what's happening right now. [Scaling may be dead, but Wired have kept that link alive for 7 years.]
And everything was planned so neatly too. 90 nanometres was to come on line late '03, ramping up this year; 65 nanometres was to be the big thing of '05, followed by 45 nanometres in '07. Now, beyond that, at 30 nanometres and 20 nanometres, things were less clear, and beyond 20 nanometres not clear at all, but at least the path was marked out from the old 130 nanometre stuff down to 45, giving us 9 times the transistors and 3 times the speed. Only someone forgot to check with the laws of physics.
Wired: How long will Moore's Law hold?So, what exactly is the problem? It's not, as Moore and others predicted, a question of actually building the circuits - that's still working fine. IBM, Intel, AMD and others have all produced working chips at 90 nanometres. The problem is leakage. Each of the millions of transistors in a chip is a tiny switch, turning on and off and incredible speeds. Each time you turn the transistor on, or off, you need to use a little bit of electricity to do so. That's okay, and it's expected, because you don't get anything for free. The problem is that the transistors are now so small, and the layers of insulation - the dielectric - so thin, that they leak. There's a partial short-circuit, and so instead of only using power when the switch switches, it's using power all the time.
It'll go for at least a few more generations of technology. Then, in about a decade, we're going to see a distinct slowing in the rate at which the doubling occurs. I haven't tried to estimate what the rate will be, but it might be half as fast - three years instead of eighteen months.
What will cause the slowdown?
We're running into a barrier that we've run up against several times before: the limits of optical lithography. We use light to print the patterns of circuits, and we're reaching a point where the wavelengths are getting into a range where you can't build lenses anymore. You have to switch to something like X rays.
So what? Electricity is cheap. Well, the so what is heat. Modern microprocessors use as much electricity as a light bulb, and that means they produce just as much heat. If they didn't have huge heat sinks and fans bolted onto them, they'd very quickly overheat and fail - a fact that some people have independently discovered.
Until now, each new generation of scaling, each new node, has brought smaller, faster, cheaper and cooler transistors. At 90 nanometres, transistors are smaller, cheaper, probably faster again - but they run hotter. And the competition in the processor market has already driven power consumption (and heat generation) about as high as it can go. So when the new generation was discovered to increase the heat rather than decrease it, the whole forty-year process of accelerating change ran head-first into a wall.
Back at the end of 2002, I made the following set of predictions for the coming year. I felt pretty comfortable in all of them, the first no less than any of the others:
My predictions for 2003:But not only did we not see 4GHz processors in 2003, it's doubtful that we'll see them in 2004 either. [Nope.] (I was wrong about number 3, too. No-one resigned, and the media moved onto the next scandal. Rinse, repeat.)
1. Microprocessors will hit 4GHz by the end of the year. Marketers will try and largely fail to convince the public to buy them.
2. A major scientific breakthrough will lead to a new and deeper understanding of something.
3. A major political scandal will result in a huge media kerfuffle and only die down when someone resigns.
4. There will be a war.
5. Bad weather will affect the lives of millions of people.
6. There will not be any major, civilisation-destroying meteor impacts.
7. Astronomers will find new and interesting things in the sky.
8. Spam, pop-ups and viruses will continue to plague us. The Internet will fail to collapse under the strain. Pundits will predict that this will now happen in 2004.
9. A rocket will explode either on the launch pad or early in its flight, destroying its expensive payload - which will turn out to be uninsured.
10. Cod populations in European waters will continue to fall, and the European parliament will fail to act to prevent this.
11. A new species of mammal will be discovered.
12. A species of reptile or amphibian will be reported as extinct.
Now, assuming you're not a hard-core computer gamer, hanging out for the release of Doom 3 Mass Effect 3 and Half-Life 2 Half-Life 2 Episode 3, why should you care?
Well if you have broadband internet, or a mobile phone, or a DVD player, or a PDA [A what?], or a notebook computer, or a digital camera (or a digital video camera), or you use GPS on your camping trips, or you enjoy the low cost of long-distance phone calls these days, if you download anime or the latest episode of Angel Doctor Who off the net, if you take your iPod iPad with you everywhere you go, if your job or your hobby involves using e-mail or looking things up on the Web, you can thank Moore's Law for it.
Modern communications depend critically on advanced signal processing techniques, performed by specialised chips called Digital Signal Processors, or DSPs. These things are everywhere - every modem, every mobile or cordless phone, every digital camera, every TV or VCR or DVD player, every stereo, every disk drive. It's the relentless advance of Moore's Law that has made DSPs fast enough and cheap enough to do all this, and made them efficient enough to run on batteries so well that your mobile phone might last a week between charging. (My first mobile was lucky to make it through the day.) Disk drives demand high-speed DSPs to sort out the signals coming from the magnetic patterns on the disk and turn them back into the original data. DVD players need them to turn the tiny pits pressed into the aluminium surface into a picture. The entire global telephone network, mobile and fixed, depends on DSPs. And any advances in any of these areas will require more and faster and cheaper DSPs and - uh-oh.
And there's more: The advances in computers and communications over the past four decades have been the primary driver of the global economy. The economy has been growing all that time, even though we have made no fundamental breakthroughs in finding new resources or new materials. If you're better off than your parents, you can thank Moore's Law for a big chunk of that - if not the effort you put in, then the new opportunities it opened up.
And it just died.
I don't think the financial markets have a clue yet what's going on, but in any case it's going to be a soft landing. All of the processor manufacturers have been in a mad rush over the last decade to produce faster chips at the expense of pretty much anything else. The funny thing is that they've been pushing so hard, they've left a lot of things behind. Take a look at this chart:
You don't have to understand exactly what this means, but the first number relates to "integer" performance, which is important for things like word processing and web browsing and databases, and the second number relates to "floating-point" performance, which is important for games. (Well, and other things too.)
1076 763 Pentium M 1.6GHz
805 635 Pentium M 1.1GHz
237 148 C3 1.0GHz (C5XL)
398 239 Celeron 1.2GHz (FSB100)
543 481 Athlon XP Barton 1.1GHz (FSB100 DDR)
581 513 Athlon XP Thoroughbred-B 1.35GHz (FSB100 DDR)
1040 909 Athlon XP 3200+ (Barton 2.2GHz, FSB200 DDR)
1276 1382 Pentium 4 3.0E GHz Prescott (FSB800), numbers from spec.org
1329 1349 Pentium 4 3.2E GHz Prescott (FSB800)
560 585 Athlon 64 3200+ 0.8GHz 1MB L2
1257 1146 Athlon 64 3200+ 2GHz 1MB L2
The Pentium M is a modified version of the Pentium III, customised for notebook computers. Since notebooks computers run off batteries, and batteries don't hold much power at all, the Pentium M has been tweaked to provide as much speed as possible while using as little power as possible. The Pentium 4, on the other hand, is designed for speed at the expense of everything else. And what we find is that the 3.2GHz Pentium 4, despite having twice the clock speed of the 1.6GHz Pentium M, is just 25% faster on integer (useful work) and 75% faster on floating point (games).
And - here's the tricky bit, and the cause of Intel's recent and dramatic change in direction - the Pentium 4 uses four times as much power as the Pentium M. So if, instead of putting one Pentium 4 onto a chip, you put four Pentium Ms, it would use the same amount of power and produce the same amount of heat, but it would run up to three times as fast... Overall.
Which is great and wonderful if you can use four processors at once. I can, quite happily, and more than that. A word processor can't, not easily, but then word processors already run pretty well. Games, and other graphics-intensive stuff like Photoshop or 3D animation software certainly can, though most games haven't been written to do so. Not yet.
Or so the situation was seven years ago. What's changed? Well, now I can have a game busy-wait on two cores at once.
The situation turned out not to be quite so dire as it appeared at the time, though a huge amount of engineering effort has gone into the advances we've seen in recent years. And yet, the server we recently deployed at my day job, while it has forty processors (yes, four-zero), is still based on the Pentium Pro (through at least six generations of intermediary designs) and only runs at 2GHz.
The bright spots have been not so much in the CPU cores themselves, as in the vector (a.k.a. SIMD) units, which have grown from 64 to 128 to 256 bits, and in video cards, which are just masses of vector processors all working together. Video cards have hit a limit too, though; they're choking on their own heat. A single high-end card can use more than 300W, about as much as a well-configured PC at the time of the original post.
And we still haven't broken 4GHz in a mainstream processor.
There are two particular beacons on the horizon at the moment. One comes from AMD, the Zambezi-Orochi-Bulldozer chip I mentioned in a recent post. If pre-launch data is correct, they expect to provide 8 cores running at 4.2GHz (and up to 4.7GHz when conditions are right) within a 125W power budget. That's a lot of processing power for a fairly low-end chip. It has some limitations; in particular, it only has one full vector unit per pair of cores, so for floating-point heavy applications like games and video editing, it will be no faster than Intels four-core chips. For the stuff I do, though - web sites and databases - it will (again assuming the details are correct) slam Intel's chips into the ground.
The other ray of light comes from Intel, because, while we might loathe the behaviour of their marketing department, they are no slouches when it comes to engineering. Their new FinFET transistors, debuting on the upcoming 22nm node, allow their chips to cut overall power consumption in half. Which means, since everything computational nowadays is limited either directly by available power or indirectly by heat dissipation, that everything can get twice as fast. Not vertically, but at least horizontally.So we're talking about mainstream desktop processors with sixteen cores, running at well over 4GHz, coming your way in the next year or two. It's not the 10GHz Pentium 4 that Intel promised us all those years ago, but it will serve. Before much time has passed we'll see games busy-waiting on eight cores, you mark my words.
* That's dedicated disk; we'll set aside the terabyte or so living in the file server. **
** Which died in the great server crash of... Around 2007, I think. Took me ages to recover all that anim... Data.
Friday, July 15
Listens to developer explaining how MySQL is no good for really large databases, like, over 100MB.
Looks at 6TB MySQL production database handling several thousand transactions per second.*
It's pretty obvious he's got broken joins. The advantage we had back in the good old days was that rather than taking several seconds to return your data, it would take several weeks, and you could hear the drive heads thrashing about while it happened, so it was kind of obvious that you'd screwed up.
I've seen far too many programs where someone has failed to realise that the half a second or so that a function takes is because they've got it completely wrong, and it should be taking milliseconds or even microseconds. Send them all back to 1985, I say. Here's your Unix box. We just upgraded it: It now has two megabytes of RAM!
* Mind you, it's not easy to get MySQL to scale that big on a single server. But 100GB is perfectly manageable, and that's a thousand times what this guy was talking about.
Thursday, July 14
According to this handy chart, AMD's new FX-8170P CPU (Order Orochi, Family Zambezi) will have 8 cores running at 4.2GHz base speed, 4.7GHz in turbo mode.
That looks like a worthwhile upgrade for my current 2.4GHz quad core. Well over three times the compute power. And because AMD has maintained a sensible continuity in their platform, I can build a system now with the latest AM3+ socket, drop my current AM3 CPU into it, swap in the octocore goodness when it lands, and use the spare CPU to upgrade my AM2 Linux box. With Intel you'd be faced with three different pin counts.
I really want to see the server versions of these chips now. We're building a cluster of AMD-based servers at my day job, and we're using the cheapest current CPUs with the plan to swap them out for the newer models when they arrive. I was expecting more cores but a slower clock speed, but based on what they've achieved on the desktop I could get more cores and a higher clock speed. That would be very nice.
Tuesday, July 12
An exotic atom with a nucleus comprising three cutinos and a chaon, orbited by a solitary oneeon.
Monday, July 11
Not official yet, but clearly on its way. Thanks for all your hard work, CentOS peeps.
Bimped: It's here!
That's one of the blockers for the new Minx platform rollout fixed. The others include a stable release of OpenVZ for RedHat/CentOS 6, and Intel's 710 series SSDs. The latter are expected this month.
Oh, and me getting time to do some work on it. That's much more likely to happen now than it was six weeks ago, since we have now filled all our situations vacant at my day job, and I'm hoping to see my hours drop from ~60 to ~35 a week.
A tip for guys: Don't proposition women you don't know in hotel elevators at 4AM if you don't want to come off as kind of creepy.
Which seems like a simple enough rule, and not one it had ever crossed my mind to breach.
Saturday, July 09
A while back, in between houses falling on me, I was working in a database written in Python, which I called Pita. I actually got it working, enough to start doing some performance tests...
At which point I shelved the project, because (a) I was absurdly busy what with the houses and all and (b) even though it had pluggable low-level storage engines, the overhead of the Python layer made it significantly slower than just using MySQL.
What Pita could do, which was nice, was (a) offer a choice of in-memory or on-disk tables using identical syntax and selectable semantics and (b) provide a log-structured database that did sequential writes for random updates. Cassandra also has this trick. The advantage here is that it (a) can cope with a huge volume of incoming data, and (b) doesn't fry consumer-grade SSDs the way MySQL would.
Unfortunately, Cassandra is a bit of a cow. Undeniably useful, but indubitably bovine.
Redis with AOF can offer similar performance, but only so long as your data fits in memory, because it's simply snapshot+log persistence (like Pita) and single threaded (unlike Pita) so it can't cope with I/O delays. This makes Redis and its support for data structures beyond simple records (hashes, lists, sets, sorted sets) great for your hot data but no use for your long tail - if, say, you've been running a blogging service for 8 years.
What you could do in that situation is use Redis for your hot data (great performance, easy backups, easy replication) and stick your cold data in a key-value store.
Like Keyspace, except that's dead.
Or Cassandra, except that's a cow.
Or MySQL, except that defeats the purpose.
Or MongoDB, except that you'd like to keep your data.
Or Kyoto Tycoon, which has pluggable APIs (don't like REST - use RPC or memcached protocol) and pluggable storage engines... Like Google's LevelDB. Kyoto Tycoon running Kyoto Cabinet uses snapshot+log for backups, but the database itself is a conventional B+ tree, so it needs to do random writes. LevelDB, on the other hand, uses log-structured merge trees - sequential writes, even for the indexes.
So Redis and Kyoto Tycoon with LevelDB both provide:
- Key-value store
- Range lookups
- Sequential writes (SSD friendly)
- Snapshot+log backups (bulletproof)
- Instant replication (just turn it on, unlike MySQL replication, which is a pain)
- Lua scripting (not yet in mainstream Redis, but coming)
- Key expiry (for caching)
- Data structures
- Lists (which can be used to provide stacks, queues, and deques)
- Sorted sets
- Bytestrings (update-in-place binary data)
- Pub/Sub messaging
- Support for databases larger than memory
- Very fast data loads
60 queries taking 0.1401 seconds, 308 records returned.
Powered by Minx 1.1.6c-pink.