Ambient Irony

I'm in the future. Like hundreds of years in the future. I've been dead for centuries.
Oh, lovely, you're a cheery one aren't you?

Tuesday, August 24

Little-Known Python Data Structures, Part 213, The Sideways Nesting Iridescent Hash Tree

Well, okay, I'm playing with named tuples some more. And records too, which are mutable named tuples. Benchmarks on Tanarotte, a cheap little 3GHz Phenom X4:

Native
tontuple * 1000000: 2.25s 444921.8/s
fromntuple * 1000000: 2.53s 394941.8/s
torecord * 1000000: 2.99s 334648.4/s
fromrecord * 1000000: 2.58s 388138.2/s

Psyco
tontuple * 1000000: 1.78s 561349.0/s
fromntuple * 1000000: 1.25s 803004.6/s
torecord * 1000000: 2.16s 462985.7/s
fromrecord * 1000000: 1.00s 1001112.3/s

And on Aoi, a very expensive 2.93GHz dual Xeon:

Native
tontuple * 1000000: 1.86s 537917.8/s
fromntuple * 1000000: 2.18s 458282.9/s
torecord * 1000000: 2.70s 370085.8/s
fromrecord * 1000000: 2.26s 442421.5/s

Psyco
tontuple * 1000000: 1.70s 588383.4/s
fromntuple * 1000000: 2.24s 445920.3/s
torecord * 1000000: 1.95s 513738.1/s
fromrecord * 1000000: 2.24s 447280.2/s

Which tells me two things.

First: Performance is pretty much a wash, so it's fine to use whichever is best for a given task.

Second: Go for the cheaper AMD server next time. A 2.6GHz quad-core/dual-socket Opteron retails for about $140; a six-core version for $210. The Xeon X5670s in our current server run about $1460 each.

Posted by: Pixy Misa at 09:39 AM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 185 words, total size 2 kb.

Sunday, August 22

Well, That's Promising

Just having a little data structure bakeoff here at PixyLabs, comparing Treedicts with Structures with named tuples. And the winnerer is...

Native
fromdict * 10000: 0.27s 36861.3/s
fromnest * 10000: 1.94s 5147.4/s
tostruct * 10000: 6.02s 1661.8/s
fromstruct * 10000: 4.44s 2254.6/s
tontuple * 10000: 0.02s 446387.8/s
fromntuple * 10000: 0.03s 385176.6/s

Psyco
fromdict * 10000: 0.26s 38590.5/s
fromnest * 10000: 1.76s 5678.3/s
tostruct * 10000: 4.75s 2106.4/s
fromstruct * 10000: 3.90s 2567.0/s
tontuple * 10000: 0.02s 575279.3/s
fromntuple * 10000: 0.01s 778944.4/s

The named tuple code isn't complete yet, and they don't do type checking (which Structures do), but 250x the performance? I think we have a winner.

Posted by: Pixy Misa at 03:14 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 110 words, total size 1 kb.

Friday, August 20

Terafying

My ISP has gone insane.

They've just announced new plans with download caps of up to 1TB per month. The price for this is the same as I'm currently paying for 225GB - and that was a free increase from 200GB which was a free increase from 150GB which was a free increase from 110GB.

The one difference is that the new plans count uploads as well as downloads; however, my upstream speed is around 2Mbps, so I can only upload about 500GB a month anyway.

Handy for doing backups of the servers. In the past I've arranged them carefully to conserve bandwidth; with this I won't have to care.

Update: My backup ISP (since I depend on the internet for work, I have a second, cheaper connection) has just announced a 1TB plan as well. I'm currently on an unlimited plan with them, but this works out quite a bit cheaper.

Posted by: Pixy Misa at 10:56 AM | Comments (1) | Add Comment | Trackbacks (Suck)
Post contains 153 words, total size 1 kb.

Tuesday, August 17

Language

German is without question the best language for describing technical faults:

Das Archiv dieser Gruppe ist momentan nicht verfÃ¼gbar
Wir entschuldigen uns fÃ¼r eventuell entstandene Unannehmlichkeiten. Bitte versuchen Sie es in KÃ¼rze noch einmal

Knowing that the Archiv is momentan nicht verfÃ¼gbar makes it seem better, somehow.

Posted by: Pixy Misa at 05:07 PM | Comments (2) | Add Comment | Trackbacks (Suck)
Post contains 46 words, total size 1 kb.

Sunday, August 15

Desperately Seeking The Last Six Months

About six months ago, I was looking for a database that didn't suck.

At the time, there weren't any, though some showed promise. Redis, for example, had a great feature set but was limited to in-memory databases.

This has now been fixed.

Fortunately (or... otherwise) my day job ate all my time for six months, and now that I'm finally returning to this project, Redis is ready for me.

While it's not the solution for every problem (the main data engine is single-threaded and blocking, so your overall throughput is limited by CPU speed) it now supports virtual memory and multi-threaded I/O, so it can scale to significantly larger datasets.

The big advantage of Redis is that it provides lightweight implementations of a variety of different data storage methods - not just key/value stores, but lists, hashes, sets, ordered sets, and message queues. You can also set expiry times on entries to use it as a cache. So with a bit of lateral thinking one server can replace a regular database like MySQL, a messaging system like ActiveMQ, a caching system like Memcached, and a document store like CouchDB.

It doesn't support strict transactions, but it has transactional features that are good enough for most tasks. It also has a nice simple replication model.

You're not going to build the next Facebook on Redis - it simply won't scale - but if you're working on something a bit smaller (like a multi-user blogging app) it might be just what you need.

Not that I'm thinking of rewriting Minx - but the new module I'm developing is going to use Redis for most of its data, and if that experiment works well, I'll naturally look at extending it further.

Update: Huh.

So, as is my wont (and as is the wont of any programmer who wishes to retain what's left of his or her sanity) I ran a benchmark on Redis before going any further. My own benchmark. Redis comes with a benchmark that cheerfully reports ~100,000 reads or writes per second. I do not trust it.

So:

[andrew@synclavier ~]$ python redistest.py
String write * 10000: 0.58s 17097.4/s
String read * 10000: 0.54s 18436.4/s
JSON write * 10000: 1.13s 8869.5/s
JSON read * 10000: 0.72s 13799.1/s
Hash write * 10000: 2.24s 4455.1/s
Hash read * 10000: 5.42s 1845.1/s

For the string benchmark, I am creating records each containing ~1kb of static text. For the JSON benchmark, a Python dict is converted to ~1kb of JSON text. For the hash benchmark, the same dict is stored as a Redis hash - basically, an extensible record structure.

String performance is not to bad for a single-threaded client/server code with no optimisations - approaching 20k ops/s for both reads and writes with a meaningful payload. Good.

The JSON benchmark is reading and writing exactly the same amount of data, showing the overhead of the JSON (actually, SimpleJSON) library. We still get over on the order of 10k ops/s. With MySQL on the production server, that's what I see on cached queries. So far, so good.

The hash benchmark is not so hot, particularly the reads. An order of magnitude slow that strings; most of an order of magnitude slower than JSON for the same data. Won't be using hashes for the high-volume stuff.

Then, curious, I tried one more thing:

Object write * 10000: 0.84s 11846.0/s
Object read * 10000: 0.54s 18473.1/s

I took that same Python dict and wrote it to Redis. Without converting it to JSON. Without mapping it to a hash. Without doing anything. And it worked.

That's kind of scary. I'm not sure how it's doing that.

If you pass the Python Redis library something that's not a string, it turns it into a string. No magic.

Update: Tweaked and re-run, with some interesting results:

Running native
String write * 10000: 0.66s 15075.6/s
String read * 10000: 0.52s 19051.1/s
JSON write * 10000: 1.07s 9359.8/s
JSON read * 10000: 0.72s 13905.8/s
JSON multi-read * 10000: 0.50s 19878.3/s
Hash write * 10000: 2.64s 3788.8/s
Hash read * 10000: 6.11s 1636.9/s
Pickle write * 10000: 0.88s 11335.1/s
Pickle read * 10000: 0.68s 14810.8/s

Running with Psyco
String write * 10000: 0.68s 14742.0/s
String read * 10000: 0.41s 24369.1/s
JSON write * 10000: 0.88s 11398.6/s
JSON read * 10000: 0.69s 14493.1/s
JSON multi-read * 10000: 0.39s 25882.6/s
Hash write * 10000: 2.06s 4845.8/s
Hash read * 10000: 2.66s 3752.5/s
Pickle write * 10000: 0.89s 11212.0/s
Pickle read * 10000: 0.63s 15953.6/s

With Psyco, hash write performance is up 20% and hash read performance is up over 100%. That certainly suggests the bottleneck is not Redis itself.

Posted by: Pixy Misa at 06:00 PM | Comments (1) | Add Comment | Trackbacks (Suck)
Post contains 778 words, total size 6 kb.

Friday, August 13

Blurgleblub

Been one of those weeks.

Anyway, on the previous subject, this puppy would likely be a little more budgetarily feasible:

Supermicro SC111LT-330 1U Chassis $215
Supermicro H8SCM-F Motherboard $235
AMD Opteron 4180 CPU $206
4GB DDR3-1333 ECC Registered RAM x2 $276
Seagate Momentus 7200.4 500GB x 3 $261
Intel 80GB X25-M SSD $249
Supermicro Slim DVD-ROM $55

Total: $1497

That's for a 6-core 2.6GHz processor, 8GB RAM, 1TB available disk in RAID-5, and a nice 80GB SSD. Dual GbE and IPMI for remote management.

Alternately, this:

Supermicro SC111LT-330 1U Chassis $215
Supermicro X8SIE-LN4F Motherboard $225
Intel Xeon X3430 CPU $219
4GB DDR3-1333 ECC Registered RAM x2 $276
Seagate Momentus 7200.4 500GB x 3 $261
Intel 80GB X25-M SSD $249
Supermicro Slim DVD-ROM $55

Total: $1500

Very similar overall; this CPU is a quad-core 2.4GHz, but with a somewhat better architecture and hyper-threading, so performance would be close to the AMD chip. This motherboard comes with quad GbE, so I can have lots of fun inventing weird network topologies.

Either one would run Minx with no problems at all; they wouldn't run the CPanel accounts, though; Apache and PHP chew up memory and CPU like nobody's business. There's not much expansion space, though, except for RAM - up to 32GB of affordable modules, and on the AMD board, up to 64GB of really expensive ones.

Posted by: Pixy Misa at 10:09 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 225 words, total size 2 kb.

Sunday, August 08

The Possibilities...

Supermicro SC217 Chassis $919
Supermicro H8DGT Motherboard x4 $1700
AMD Opteron 6128 CPU x8 $2288
4GB DDR3-1333 ECC Registered RAM x32 $4256
Seagate Momentus XT 500GB x24 $3240

Total: $12,403

Okay, that's a lot of money, yes.

But what you get for that is a 64-processor server with 128GB of RAM and 12TB of disk with a 96GB non-volatile cache.

Which fits in 2U of rack space.

It actually has room for 48 CPUs and 512GB of RAM, but that would blow the price out to $33,555, which is a wee bit expensive.

There's also a version of the motherboard with built-in QDR Infiniband, but it costs twice as much.

Or this, for storage:

Supermicro 36-bay chassis with DP motherboard and RAID controller $2569
AMD Opteron 6128 CPU x2 $572
4GB DDR3-1333 ECC Registered RAM x8 $1104
Seagate Barracuda LP 2TB x36 $4140

Total: $8385

Or finally:

Supermicro 6-bay chassis with QP motherboard: $1993
AMD Opteron 6128 CPU x4 $1144
4GB DDR3-1333 ECC Registered RAM x32 $4256
Seagate Barracuda ES.2 1TB x6 $954

Total: $8347

I don't need 72TB of disk space, but I know someone who does.

Posted by: Pixy Misa at 01:53 PM | Comments (10) | Add Comment | Trackbacks (Suck)
Post contains 190 words, total size 1 kb.

1 Hey, a mid-range MongoDB server!

-j

Posted by: J Greely at Monday, August 09 2010 01:47 AM (2XtN5)

2 In fact, that's exactly what I had in mind. At least, partly.

I was looking at using MongoDB for Minx; unfortunately, it doesn't like OpenVZ at all; if your database is bigger than the memory available to the virtual machine, MongoDB will crash. (It's a known bug and they're working on it, but it would require a whole new storage engine to fix.)

This system is actually four 16-core 32GB servers packed into a 2U chassis. So one for the applications, one for MongoDB, and two for Counterstrike.

Posted by: Pixy Misa at Monday, August 09 2010 11:47 AM (PiXy!)

3 Of all the NoSQL databases I've looked a recently, the three that are most interesting are MongoDB - which crashes under OpenVZ; Redis - which requires all data to be in memory at all times; and Keyspace - which is a simple, reliable, key-value store that actually lets you inspect the keys (which almost nothing else does).

Naturally, the developers of Keyspace are dropping it to move on to a new project. neutral

Posted by: Pixy Misa at Monday, August 09 2010 11:49 AM (PiXy!)

4 Thanks for not looking at Couch at least. Man that thing was ugh. Mongo I actually considered. The problem is, however, it's not a better db4 than db4 (that would be Tokyo and now Kyoto cabinets). If you use Mongo as a drop-in nosql, you win exactly nothing. To get the advantage, you have to steal from J.Greely's playbook (of course I had his blog posts squirreled away for a better time). You have to make your application to use Mongo, and that loses you portability. You are a slave to Mongo from that point on.

Posted by: Pete Zaitcev at Monday, August 09 2010 01:13 PM (/ppBw)

5 Yeah, even after testing at full scale, I had to basically start over from scratch when I ran into some of the more interesting limitations in 1.4.x. Once the 1.6 branch stabilizes a bit, I may be able to get closer to what I originally wanted.

-j

Posted by: J Greely at Monday, August 09 2010 02:10 PM (a8YWB)

6 Mongodb has some interesting tricks that solve some specific denormalisation problems I have with Minx.

I did look at Couchdb, but it doesn't really give me anything I need.

The simplest solution I can see that gives me everything I need would be Mongodb + Redis + Xapian + Memcached. Which is kind of a mess and would require a non-OpenVZ server for MongoDB, but should at least work.

Posted by: Pixy Misa at Monday, August 09 2010 03:19 PM (+J3Nt)

It's an indication of just how fast and far computer science has grown that there are now such a wide variety of sub-specialties which are so different from one another that experts in them cannot converse intelligently.

When I was still able to work, my area of expertise was embedded software. I know nothing about databases. It's all greek to me.

Posted by: Steven Den Beste at Monday, August 09 2010 03:41 PM (+rSRq)

8 I should have said "Jeff Darcy considered Mongo seriously" because of course I am not competent to consider it seriously.

Posted by: Pete Zaitcev at Tuesday, August 10 2010 12:52 AM (/ppBw)

9 Maybe I should have said, "It's all geek to me."

Posted by: Steven Den Beste at Tuesday, August 10 2010 01:49 AM (+rSRq)

10 Beware of geeks bearing .gifs?

Posted by: Avatar_exADV at Wednesday, August 11 2010 12:53 PM (pWQz4)

Hide Comments | Add Comment

69kb generated in CPU 0.1308, elapsed 0.2989 seconds.
54 queries taking 0.2851 seconds, 376 records returned.
Powered by Minx 1.1.6c-pink.

Using https / https://ai.mee.nu / 374

Tuesday, August 24

Sunday, August 22

Friday, August 20

Tuesday, August 17

Sunday, August 15

Friday, August 13

Sunday, August 08

Praise for Ambient Irony

Contact Support

Contact Pixy

Business News

Search Thingy

Recent Comments

Topics

Monthly Traffic

Content

Categories

Archives

A Fine Selection of Aldebaran Liqueurs

That Ol' Janx Spirit

Mostly Harmless

MuNu Blogroll

Dish of the Day

Feeds