It's a PyPy MySQL DB Client Linux Benchmark Query Battle!
Been working on our new platform this weekend.
There's a query at the core of everything that pulls together the posts and comments from whatever you're looking at (a blog, your timeline, a forum thread) along with your interactions (likes, follows, bookmarks, votes) and attached data (replies, shares, and so on). It has four joins and twelve subqueries plus a variable number of parameters, but at least three.
It was taking 25 milliseconds to run.
Well, actually it was taking 45 seconds to run, but I fixed that pretty quickly.
Sometimes the best way to solve a subtle problem is to turn it into a huge problem with a clear solution. So if your database query is a little slow, you add more data until it's really slow and you can measure differences as you vary the parameters, and if you can't find where that strange smell is coming from, you burn your house down.
So after fixing that big problem, I was left with a 25 millisecond query. That's not terrible, but what I really wanted was a 5 millisecond query. Fiddling with the query on a small dataset didn't tell me anything, so I benchmarked the posting API overnight and added another 20 million records.
That found it for me, and I got the query down to about 10 milliseconds. That's a big improvement, and enough to make it viable for production. Most importantly, it moved the workload from about 50:50 between database and application to 1:4 database to application - a single database server could keep four application servers busy. Or if I deploy on DigitalOcean, one $40 database server could provide for 16 $5 application servers.
Then I took a closer look at the application to find out where the time was spent (it's safe to assume MySQL itself is pretty well optimised at this point). I checked out my ORM* but that was only taking 0.4 milliseconds on that 25 (now 10) millisecond query.
Some more poking found that a lot of time was being spent inside the MySQL client library. But that's written in C, so there's not much I can do to speed it up, is there?
As a test, I ran my benchmark under CPython (interpreted) instead of PyPy (compiled).
It was twice as fast.
The interpreter was twice as fast as the compiler.
Why? Because CPython has a much easier time binding to C libraries than does PyPy. The PyPy compiler does a lot of internal optimisations that mean it needs a translation layer for older-style C libraries. And that translation layer isn't very fast.
There's an alternative database library written in Python. Under the Python interpreter that's a really bad idea - C is much much faster - but under the PyPy compiler it's not so silly.
So I swapped out mysqlclient and imported PyMySQL and my performance instantly doubled.
So the heart of the platform is running five times faster now than it was yesterday.**
There is still a scaling problems that will crop up as the database grows, but (a) I'm testing with 20 million posts right now and it's fine so far and (b) I know how to solve that one.
I'm doing some more testing with a skewed dataset, where 1% of the channels (a channel is a blog or forum or whatever) have 50% of the posts, but right now it's looking good.
So I can spend a couple of hours now trying to fix our existing platform, which I've been neglecting for three months now.
Update: Oops. Ran out of disk space. Stupid search index.
I also tested out the brand new Ubuntu 18.04, but the outcome there was that it needs another three months in the oven before I'd use it for production.
* I wrote a little database abstraction layer called Mirai, because the usual libraries like SQLAlchemy do a lot more than I need and are rather slow. I wanted something simple and very fast. It's really more of a ROM than an ORM, since it translates relational query data to nested object structures, but inserts are mostly manual.
** And 10,000 times faster than the day before, but never mind that...
Posted by: Pixy Misa at
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 711 words, total size 5 kb.
Hello Blog, How Are You?
Why yes, I did get suspended on Twitter. How did you know?
The mob got me. Was having a nice civil discussion blazing row over a California State Senate bill legislating Orwell's Ministry of Truth and I slipped up spoke my mind reported scientifically established fact and called one of the fascists retarded:
Two things to note: First, it's transparently obvious that I didn't break those rules. Second, those rules are retarded. Third, so is the goddamn California state senate
TITLE 14.5. False Information Strategic Plans
(a) Any person who operates a social media Internet Web site with physical presence in California shall develop a strategic plan to verify news stories shared on its Internet Web site.
(b) The strategic plan shall include, but is not limited to, all of the following:
(1) A plan to mitigate the spread of false information through news stories.
(2) The utilization of fact-checkers to verify news stories.
(3) Providing outreach to social media users regarding news stories containing false information.
(4) Placing a warning on a news story containing false information.
(c) As used in this section, "social media” means an electronic service or account, or electronic content, including, but not limited to, videos, still photographs, blogs, video blogs, podcasts, instant and text messages, email, online services or accounts, or Internet Web site profiles or locations.
I don't know if I have ever seen a more blatant First Amendment violation, but these retards are perfectly fine with handing this power to the government even when Donald Trump - who is Literally Hitler - is in power.
Update: Twitter has now locked all my alternate accounts and won't even let me read tweets any more, though the message clearly says I can read tweets.
Update 2: I'm back, and I've saved a copy of all my tweets. Not that anyone would ever want them, but anyway...
Posted by: Pixy Misa at
| Comments (16)
| Add Comment
| Trackbacks (Suck)
Post contains 317 words, total size 3 kb.
49kb generated in CPU 0.84, elapsed 0.8642 seconds.
50 queries taking 0.7982 seconds, 273 records returned.
Powered by Minx 1.1.6c-pink.