Friday, January 14

Geek

Ugh, Ugh, Ugh, Flashbacks to 1988

Update: I think I've worked out why my benchmark results were so bad, and I'll be modifying my code and re-running them.  Stay tuned!

Well, possibly 1993.

Writing SQL is one of my least favourite tasks. SQL Alchemy and Elixir promise to magic this away (most of the time) by providing an object-relational mapping and an active record model.

What is now called the active record model is what I used in my day job from 1988 to 2006. It's good. In fact, anything else sucks. What the active record model means is that you interface with your database as though your data were yours, and not some precious resource doled out to you by your electronic betters as they saw fit.

What's not so good is that the performance I'm getting is more 1988 than 2006, let alone 2011.

I've created a database of 100,000 random movies (based on the Elixir tutorial). That took around 30 seconds. Then I went all Hollywood and created remakes of them all - that is, I updated the year field to 2012.

As a single statement in MySQL, that took 0.38 seconds.

As 100,000 individual raw updates fired off through the MySQL-Python interface (which is a horrible way to do things), 9 seconds.

Via Elixir... 230 seconds.

No. Just no. That's crap.

You can show the SQL that Elixir/Alchemy generates, and it's the same as what I'm doing manually. The SQL is fine. It's the rest of the library that's a horrible mess.

It takes 5.8 seconds to read those 100,000 movies in via Elixir.

It takes 0.34 seconds to read them in via MySQLdb and convert them into handy named tuples.* (In essence, inactive records.)

Okay, hang on one second. I'm using the less-well-known Elixir here on top of SQL Alchemy. How much of the damage is Elixir, and how much is SQL Alchemy itself?

Let's see! This is my first benchmark, creating 100,000 random movies:

Method
Elapsed
User
System
Single insert
10.4
4.1
1.8
Bulk insert
2.1
1.6
0.04
SQL Alchemy
33.2
26.4
2.1
SQL Alchemy Bulk
33.4
26.8
1.9
Elixir
35.8
28.7
2.3

(The bulk insert is not directly comparable; it's just the sane way to do things.)

Elixir is described as a thin layer on top of SQL Alchemy, and that appears to be true; the overhead is less than 10%, and it's quite elegant. The overhead of SQL Alchemy, on the other hand, is ugh.

This is why I spend so much time testing performance. I spend a lot more time benchmarking than I do actually writing code - or testing it. Because if your code has a bug, you can fix it. If your performance is broken by design, you have to throw the whole thing out and start again.

I know the SQL Alchemy guys aren't idiots, and I know they've spent considerable time profiling and optimising their code. What I don't know is why it's still so horrible.

The pre-release 0.7 code is supposed to be 60% faster, but I'm looking for 600% just to reach something acceptable. Is that even possible? I don't know, but I'll try to find out.

Update: Retested SQL Alchemy with bulk inserts (using add_all()).  No improvement.

* 300,000 records per second is pretty quick, though keep in mind that these are tiny records - just four fields.

Posted by: Pixy Misa at 04:04 AM | Comments (6) | Add Comment | Trackbacks (Suck)
Post contains 564 words, total size 6 kb.

1 Relational databases are black magic to me. I spent 25 years as a software engineer and never got anywhere near them. (Didn't have any need when doing embedded.)

Posted by: Steven Den Beste at Friday, January 14 2011 04:38 AM (+rSRq)

2 To paraphrase loosely, "when I hear the words 'object-relational mapping', I release the wild dogs". That may just be my bad experience with Hibernate + Tomcat, but best to play it safe.

I love the idea of encapsulating SQL schemas in an object abstraction. It's just that the abstractions always leak, and I end up going through megabytes of log files with a mop and bucket, searching for the errant NULL or giant pile of beans that someone inflicted on us.

The hill-of-beans ones are my "favorites", because I'll find a SQL query that took 8 seconds followed by an object-creation orgy that took forty minutes.

-j

Posted by: J Greely at Friday, January 14 2011 05:27 AM (2XtN5)

3 I have no idea what 'yall are sayin'....so I'll just leave this here. 

Posted by: The Brickmuppet at Friday, January 14 2011 07:34 AM (EJaOX)

4 XKCD is always on topic. smile

I've been working with relational databases since 1988, and only one company that I've seen has got them right, and that's Progress.  Unfortunately, they never understood the value of what they had, or how to market it.  Also, while the language was incredibly productive to work with, it was as slow as a wet week for anything other than the most straightforward data manipulation.

SQL Alchemy with Elixir brings Python's database support to somewhere near what Progress provided back in 1988.  Unfortunately, as I noted, they reduce Python's database performance to somewhere near what Progress provided back in 1988.

Posted by: Pixy Misa at Friday, January 14 2011 09:52 AM (PiXy!)

5 I'd like to think Progress made some performance gains from 1988 in subsequent releases. The scary thing is that based on your performance measures everything else is even slower ? At least when implementing relational mapping and active records. Progress need a sound beating around the head and neck for being so stupid as to not know what they have.

Posted by: Andrew Fong at Tuesday, January 18 2011 01:18 PM (cB03i)

6 Relative to system performance, they've been going backwards since 1988.  In absolute terms, of course, they're hundreds of times faster.

Posted by: Pixy Misa at Tuesday, January 18 2011 07:13 PM (PiXy!)

Hide Comments | Add Comment

Comments are disabled. Post is locked.
53kb generated in CPU 0.0158, elapsed 0.1369 seconds.
56 queries taking 0.1261 seconds, 352 records returned.
Powered by Minx 1.1.6c-pink.