Friday, January 14
Update: I think I've worked out why my benchmark results were so bad, and I'll be modifying my code and re-running them. Stay tuned!
Well, possibly 1993.
Writing SQL is one of my least favourite tasks. SQL Alchemy and Elixir promise to magic this away (most of the time) by providing an object-relational mapping and an active record model.
What is now called the active record model is what I used in my day job from 1988 to 2006. It's good. In fact, anything else sucks. What the active record model means is that you interface with your database as though your data were yours, and not some precious resource doled out to you by your electronic betters as they saw fit.
What's not so good is that the performance I'm getting is more 1988 than 2006, let alone 2011.
I've created a database of 100,000 random movies (based on the Elixir tutorial). That took around 30 seconds. Then I went all Hollywood and created remakes of them all - that is, I updated the year field to 2012.
As a single statement in MySQL, that took 0.38 seconds.
As 100,000 individual raw updates fired off through the MySQL-Python interface (which is a horrible way to do things), 9 seconds.
Via Elixir... 230 seconds.
No. Just no. That's crap.
You can show the SQL that Elixir/Alchemy generates, and it's the same as what I'm doing manually. The SQL is fine. It's the rest of the library that's a horrible mess.
It takes 5.8 seconds to read those 100,000 movies in via Elixir.
It takes 0.34 seconds to read them in via MySQLdb and convert them into handy named tuples.* (In essence, inactive records.)
Okay, hang on one second. I'm using the less-well-known Elixir here on top of SQL Alchemy. How much of the damage is Elixir, and how much is SQL Alchemy itself?
Let's see! This is my first benchmark, creating 100,000 random movies:
|SQL Alchemy Bulk
(The bulk insert is not directly comparable; it's just the sane way to do things.)
Elixir is described as a thin layer on top of SQL Alchemy, and that appears to be true; the overhead is less than 10%, and it's quite elegant. The overhead of SQL Alchemy, on the other hand, is ugh.
This is why I spend so much time testing performance. I spend a lot more time benchmarking than I do actually writing code - or testing it. Because if your code has a bug, you can fix it. If your performance is broken by design, you have to throw the whole thing out and start again.
I know the SQL Alchemy guys aren't idiots, and I know they've spent considerable time profiling and optimising their code. What I don't know is why it's still so horrible.
The pre-release 0.7 code is supposed to be 60% faster, but I'm looking for 600% just to reach something acceptable. Is that even possible? I don't know, but I'll try to find out.
Update: Retested SQL Alchemy with bulk inserts (using add_all()). No improvement.
* 300,000 records per second is pretty quick, though keep in mind that these are tiny records - just four fields.
Posted by: Steven Den Beste at Friday, January 14 2011 04:38 AM (+rSRq)
I love the idea of encapsulating SQL schemas in an object abstraction. It's just that the abstractions always leak, and I end up going through megabytes of log files with a mop and bucket, searching for the errant NULL or giant pile of beans that someone inflicted on us.
The hill-of-beans ones are my "favorites", because I'll find a SQL query that took 8 seconds followed by an object-creation orgy that took forty minutes.
Posted by: J Greely at Friday, January 14 2011 05:27 AM (2XtN5)
Posted by: The Brickmuppet at Friday, January 14 2011 07:34 AM (EJaOX)
I've been working with relational databases since 1988, and only one company that I've seen has got them right, and that's Progress. Unfortunately, they never understood the value of what they had, or how to market it. Also, while the language was incredibly productive to work with, it was as slow as a wet week for anything other than the most straightforward data manipulation.
SQL Alchemy with Elixir brings Python's database support to somewhere near what Progress provided back in 1988. Unfortunately, as I noted, they reduce Python's database performance to somewhere near what Progress provided back in 1988.
Posted by: Pixy Misa at Friday, January 14 2011 09:52 AM (PiXy!)
Posted by: Andrew Fong at Tuesday, January 18 2011 01:18 PM (cB03i)
Posted by: Pixy Misa at Tuesday, January 18 2011 07:13 PM (PiXy!)
52 queries taking 0.1605 seconds, 282 records returned.
Powered by Minx 1.1.6c-pink.