Monday, December 30

Geek

The Best Timeline

So I had another idea for optimising my timeline query. 

It was taking 1.3 seconds with 5 million messages in the system, which is obviously crazy.  The stack version takes around 10 milliseconds, but does require the system to pre-build all those stacks, meaning extra database operations, extra I/O, extra storage, extra complexity...

But I know that MySQL, particularly with TokuDB, can scan the database in primary key order really, really fast.  So what if I find a way to ensure that it just does that scan while (a) still applying the privacy filters, (b) only showing your friends' posts, (c) doing all the joins and subselects, and (d) still stopping at 20 (or 50 or whatever) messages?

Is that even possible?

    20 rows in set, 1 warning (0.00 sec)

Apparently it is.

Needs a little refinement, so I'm going to up the test dataset to 10 million and then give that a shot.  Where it will fall down is if you are following very few people or people who haven't posted recently, but I can set a threshold to only scan so many thousand records if that becomes a problem.

The stack solution is still the way to go long term, but having a fallback that is also reasonably fast is very much a good thing.

Update: Refinements done and working like a charm on 10 million messages.

What this query does is this:
  • Find the most recent messages
  • From your friends
  • Who you haven't muted
  • And are in channels that you have access to
  • And get their user details
  • And the channel details
  • And the parent message if it's a reply
  • And the shared content if it's a share (like a retweet)
  • And whether you've liked it
  • Or bookmarked it
  • Or reacted to it in another way
  • And whether the poster is following you
  • And the details of when you followed them
  • And whether either party is blocking the other (which in this system just prevents interaction, not viewing content)
  • And also whether either party has the other muted
  • And if it's a poll, whether you have voted and which option you voted for
It can do this in under 10 milliseconds for a database of 10 million messages (which is admittedly pretty small on the Twitter scale of things) on a single $24 virtual server (which is very small) just based on the raw database with no extra tables or indirection.

Yay.

Now on to the UI!

Posted by: Pixy Misa at 05:54 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 413 words, total size 3 kb.




Apple pies are delicious. But never mind apple pies. What colour is a green orange?




49kb generated in CPU 0.0401, elapsed 0.1355 seconds.
56 queries taking 0.1275 seconds, 346 records returned.
Powered by Minx 1.1.6c-pink.