Dear Santa, thank you for the dolls and pencils and the fish. It's Easter now, so I hope I didn't wake you but... honest, it is an emergency. There's a crack in my wall. Aunt Sharon says it's just an ordinary crack, but I know its not cause at night there's voices so... please please can you send someone to fix it? Or a policeman, or...
Back in a moment.
Thank you Santa.

Tuesday, December 31

Geek

Daily News Stuff 31 December 2019

Last Day Of The Second Last Year Of The Second Decade Of The First Century Of The Third Millennium Edition

Tech News



Happy New Year Videos



Disclaimer: There was no year zero.

Posted by: Pixy Misa at 09:44 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 293 words, total size 3 kb.

Monday, December 30

Geek

Daily News Stuff 30 December 2019

Intermittently Retromingent Edition

Tech News



Disclaimer: I say potato, you say potato, potato, potato, potato, potato, now I'm hungry.

Posted by: Pixy Misa at 09:07 PM | Comments (5) | Add Comment | Trackbacks (Suck)
Post contains 134 words, total size 2 kb.

Geek

The Best Timeline

So I had another idea for optimising my timeline query. 

It was taking 1.3 seconds with 5 million messages in the system, which is obviously crazy.  The stack version takes around 10 milliseconds, but does require the system to pre-build all those stacks, meaning extra database operations, extra I/O, extra storage, extra complexity...

But I know that MySQL, particularly with TokuDB, can scan the database in primary key order really, really fast.  So what if I find a way to ensure that it just does that scan while (a) still applying the privacy filters, (b) only showing your friends' posts, (c) doing all the joins and subselects, and (d) still stopping at 20 (or 50 or whatever) messages?

Is that even possible?

    20 rows in set, 1 warning (0.00 sec)

Apparently it is.

Needs a little refinement, so I'm going to up the test dataset to 10 million and then give that a shot.  Where it will fall down is if you are following very few people or people who haven't posted recently, but I can set a threshold to only scan so many thousand records if that becomes a problem.

The stack solution is still the way to go long term, but having a fallback that is also reasonably fast is very much a good thing.

Update: Refinements done and working like a charm on 10 million messages.

What this query does is this:
  • Find the most recent messages
  • From your friends
  • Who you haven't muted
  • And are in channels that you have access to
  • And get their user details
  • And the channel details
  • And the parent message if it's a reply
  • And the shared content if it's a share (like a retweet)
  • And whether you've liked it
  • Or bookmarked it
  • Or reacted to it in another way
  • And whether the poster is following you
  • And the details of when you followed them
  • And whether either party is blocking the other (which in this system just prevents interaction, not viewing content)
  • And also whether either party has the other muted
  • And if it's a poll, whether you have voted and which option you voted for
It can do this in under 10 milliseconds for a database of 10 million messages (which is admittedly pretty small on the Twitter scale of things) on a single $24 virtual server (which is very small) just based on the raw database with no extra tables or indirection.

Yay.

Now on to the UI!

Posted by: Pixy Misa at 05:54 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 413 words, total size 3 kb.

Sunday, December 29

Geek

Daily News Stuff 29 December 2019

Only Two More Shopping Days Until New Year's Edition

Tech News


New System Notes

Doing API testing on the new system today. Request routing, logins, sessions, cookies, automatic compression, all that good stuff.

The query that I was worried would slow down as the database grows - the standard user timeline - does indeed slow down as the database grows. I built in an engine to take care of that, and today I wrote the necessary query to use that engine.

Since that new query currently takes the time to build a timeline from 0.03s to 0.00s I'm adding more data to my test system to measure it again.

Oh, and you can search within your timeline. Twitter lets you do that now too, though.

Update: Stack engine vs. timeline engine:

500 timeline requests in 69.248s, avg request 139.1ms
500 stack requests in 3.963s, avg request 7.9ms


With a small database the timeline query was running fine. But if the system had taken off it would have been Fail Whale Squared. (I think this type of query caused about 90% of Twitter's problems in the early days.)

Stack requests automatically remember the last N items in your timeline so they don't have to mess around finding them again.

The other major mode is the channel request, which are used for blogs and forums and things like that. Those have no problems:

500 channel requests in 3.042s, avg request 6.1ms

That's the API request time, by the way, not just the database request, though for the timeline the database request is the overwhelming majority of the time.

I knew about this before but hadn't done the optimisation, because having a standard query let me enforce privacy checks in a single central location. Now I have three versions of the query and have to make sure the privacy checks are applied to each one.

Now I'm wondering if I can fix up that timeline query to make it run faster, because that could be really useful...

Update: Hah! Yes, that works. If I need to rebuild a user's stack, I can find the IDs of the last thousand posts that should appear in their timeline and shove them into their stack in 60 milliseconds flat. Then database queries within the stack take about 4ms.

The idea there is that if you don't log in for a while the system will stop updating your stack to save resources, but when you do log in I want it brought up to date quickly enough that you don't really notice it. 60ms is fine.

The main message query has five joins and ten subselects, which is great when the optimisation is just right because it gets everything the API needs in one go. When the optimisation is not just right, though, things go south in a hurry.

The stack works great because it means the main query never has to sort - to get the top 20 messages it just reads the top 20 stack records in index order and does five one-to-one joins.


Disclaimer: I tried to recite "How Doth the Little Busy Bee" but it came out all different.

Posted by: Pixy Misa at 10:47 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 799 words, total size 6 kb.

Geek

Daily News Stuff 28 December 2019

Blargh Part Two Edition

Tech News

  • Had to restart Minx just now, and then block yet another misbehaving web crawler.

    I have more sympathy for J. Jonah Jameson every day.

  • Cloudflare's warrant canary is an ex-parrot.  (TechDirt)

    Three of the seven statements included in the canary at the beginning of the year are now gone.

  • There will be a second season of The Mandalorian.  (Tech Crunch)

    Disney having destroyed the Star Wars franchise in the cinemas and de-canonised the Expanded Universe, it's all they have left right now.

  • Zen 3 will have a 17% IPC gain and a 50% floating point gain over Zen 2 unless it doesn't.  (WCCFTech)

    That's a lot, but AMD has been saying for a while that Zen 3 will be a significant update.

  • Sonos devices feature a recycling mode that lets you retire your old equipment in a safe and environmentally friendly way - by irrevocably destroying it.



    Sustainability is non-negotiable, says Sonos.  You can't have it.

  • Oh look another horribly security hole in NPM.  (Snyk.io)

    The article is from September and I believe this has since been fixed and replaced with seven brand new security holes.

  • My test environment for the new system is on a private container in a virtual server behind two firewalls.  The container is accessible via SSH from my home IP address on a non-standard port by forwarding the SSH connection over an internal SSH tunnel on the container host.

    Now that I'm testing the app and API (and not just the code they're built on) I need access to the web services, so I have an SSH tunnel from WSL - a Linux virtual server running on Windows - running over the SSH tunnel on the KVM VPS to connect localhost:8080 on my PC to localhost:8080 on the container.

    This actually works.

  • MSI has announced a mini-LED laptop to be introduced at CES.  (ZDNet)

    To be clear, at 17" it's not mini at all, and it has an LCD display, not LED.  Apart from that, though, it will deliver HDR 1000 and 100% DCI-P3 at 4K, which is as good as it gets right now.

  • Plant-based burgers will make men grow boobs - says Livestock News.  (Ars Technica)

    The story (which seems to be popping up everywhere) is that Burger King's Impossible Whopper contains 18 million times as much estrogen as a regular Whopper.

    In fact it contains zero estrogen, because it's plant-based and plants don't produce estrogen.  What it contains are isoflavones, plant molecular analogs of estrogen that can bind to estrogen receptors but as far as any actual research has been able to determine, do absolutely nothing in humans.

  • RethinkDB 2.4 is out.  (RethinkDB)

    RethinkDB is an interesting MongoDB-style document database that offered very flexible changefeeds (a.k.a tail cursors or notification streams) years before anyone else had them.

    The company behind it couldn't get enough funding - MongoDB sucked all the air out of the room there - and was forced to shut down, but the open-source repositories have been handed over to a volunteer effort and the project is back on its feet.


Disclaimer: SSH tunnels are magic, but there is a limit.

Posted by: Pixy Misa at 12:00 AM | Comments (3) | Add Comment | Trackbacks (Suck)
Post contains 533 words, total size 5 kb.

Friday, December 27

Geek

Daily News Stuff 27 December 2019

Where Does The Time Go Edition

Tech News

  • China's latest CPUs have caught up with AMD's Excavator.  (Tom's Hardware)

    Current mobile chips are already faster than Excavator.

  • Graphene batteries aren't coming in 2020, and Huawei's P40 Pro won't have one unless it does.  (WCCFTech)

    This is getting confusing, guys.

  • Europe is making life difficult for internet companies with a corporate presence within the EU.  You can however just choose not to have a corporate presence there.

    India is planning to close that loophole.  (Tech Crunch)

    If you have more than five million users and want to operate in India, you have to provide hostages.  And deploy a Class 2 AI to ban speech the Indian government might object to, before they object to it.

    The article discusses Wikipedia because they have hundreds of millions of users and would clearly fall under the planned legislation, but operate on a - relatively - tiny budget.

  • In letter to dinosaurs, YouTube describes the Chicxulub Impactor as "regrettable".  (CoinDesk)

  • Sony may have had to retreat from some markets but their digital sensor division is going gangbusters.  (Bloomberg)

    They make the cameras for a lot of phones, and their factories are running 24x365 and still not keeping up.

  • Set JavaScript on fire and set that damn chart on fire too.  (StateofJS)

  • Statler and Waldorf discuss the latest UX trends.  (grumpy.website)

    Yes, the site is actually called grumpy.website, and it delivers.

  • I wonder how fast uWSGI RPC is compared to web requests...

    Update: uWSGI RPC doesn't seem to work.  Not sure what I'm doing wrong, but I don't want to rely on something this fiddly.

    Swapped in ZMQ and not only did it work first time, but I'm getting <500µs for a 100-byte JSON request with a 10K JSON response, and <900µs with a 100K response.  That's with the standard JSON encoder, not the custom one with the fancy date/time support.

    Let's see....  The smart encoder can spit out 10K in 120µs and 100K in 1.3ms.  That depends a lot on the balance of fiddly stuff (dates, decimals) and easy stuff (text) but it's not bad; if someone uses the smart JSON option it's not going to make the system collapse.  Anyway, it's fast enough to flood a gigabit internet uplink with JSON, which is all I ask.


Disclaimer: Personally I reckon it's worth it.

Posted by: Pixy Misa at 09:18 PM | Comments (2) | Add Comment | Trackbacks (Suck)
Post contains 395 words, total size 4 kb.

Geek

uWSGI Returns

Was having a lot of trouble getting uWSGI to install for PyPy 3 - which is what the whole of the new system is using.

After working my way through this GitHub issue I have it all working (at least it appears so thus far).  But it's rather annoying in that Python 2 is going away in a couple of months and the maintainers of uWSGI don't seem to care enough to copy a known solution into their codebase.

I fussed around with manually building uWSGI, but it turns out you don't need to do that.  Just install PyPy 2 alongside your PyPy 3 and use pip install uwsgi.  That will give you uWSGI built with the PyPy plugin, which works with either version.

You then need to use --pypy-home to point to your PyPy 3 directory and --pypy-lib to point to your PyPy .so file.

It still won't actually work, though, because the setup script is written in Python 2.  So grab this alternate setup script and specify it with --pypy-setup.

And now it works.

There's a thread with all the details on GitHub.

It's a bit disappointing that a complete solution exists but hasn't been pulled into the codebase, but at least a complete solution exists.

Also, having done all that, it's slower than with PyPy 2.  About 1.6ms for a proxy request vs. 1.2ms under PyPy 2.

Also also, the startup command is now full of nonsense:

/opt/pypy2/bin/uwsgi --pypy-lib /opt/pypy3/bin/libpypy3-c.so --pypy-home /opt/pypy3 --pypy-setup tools/pypy_setup.py --master --http-socket 127.0.0.1:8080 --threads 100 --pypy-wsgi-file proxyf.py

But we can fix that.  Link the uwsgi binary into /usr/local/bin, create an ini file:

    [uwsgi]
    http-socket = 127.0.0.1:8080
    stats = 127.0.0.1:8081
    master = true
    processes = 1
    threads = 100
    pypy-home = /opt/pypy3
    pypy-lib = /opt/pypy3/bin/libpypy3-c.so
    pypy-setup = tools/pypy_setup.py
    pypy-wsgi-file = proxyf.py

And now it's just uwsgi proxyf.ini

Posted by: Pixy Misa at 03:59 PM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 304 words, total size 3 kb.

Thursday, December 26

Geek

Daily News Stuff 26 December 2019

Taking Time Out To Digest Edition

Tech News

  • The Threadripper 3970X is a 32 core 280W monster.  But how does it perform if you dial the power back a little?  (WCCFTech)

    Thirdripper has configurable TDP, the same as Ryzen 3000.  So this doesn't even require manual undervolting; just fire up Ryzen Master and pick your TDP target.

    At 180W it still delivers 90% of full multi-core performance, and at 140W it manages 82%.  That gives us a good idea of the performance of the upcoming 3990X - double the cores, double the power budget back up to 280W, and it will be 64% faster than the 3970X.  Of course, it's not as simple as that, because memory bandwidth won't double, but memory bus power consumption won't double either, and cache will double...

    Cut to 95W the 3970X is clearly struggling and achieves only 48% of its full potential.  At that point it is actually slower than the Intel 10980XE.

  • Samsung is investing $116 billion into its chip manufacturing division.  (The Verge)

    That's quite a lot.  TSMC's ascendence has effectively been bankrolled by Apple, and Samsung are their only real competition at this point.

  • Cloudflare has taken on the role of supporting CDNJS.  (Cloudflare)

    While Cloudflare were providing the CDN for CDNJS, the project was run until now entirely by volunteers.  When some automated scripts broke last month and the volunteers didn't have the resources to fix it right away, the future of CDNJS seemed in doubt.

    So this is, on the whole, a good thing.

  • TokuDB doesn't support foreign key constraints.  Doesn't even save your definitions, it just drops them straight onto the floor.  That's not a critical issue here because I control the application code as well as the database definitions, but it's a bit limiting.

  • YouTube's latest target is cat videos.  I'm not even kidding.



Movie of the Day



Disclaimer: Pop a Poppler in your mouth,
When you come to Fishy Joe's,
What they're made of is a mystery,
Where they come from, no one knows.
You can pick 'em, you can lick 'em,
You can chew 'em, you can stick 'em,
If you promise not to sue us,
You can shove one up your nose.

Posted by: Pixy Misa at 10:40 PM | Comments (4) | Add Comment | Trackbacks (Suck)
Post contains 374 words, total size 3 kb.

Wednesday, December 25

Geek

Daily News Stuff 25 December 2019

What Day Edition

Tech News

  • Ponte Vecchio, Rambo Cache, and Gelato: The crisis inside Intel's codename division.  (AnandTech)

    Or possibly an in-depth analysis of Intel's upcoming high-performance computing platform.  One or the other.

  • A Twitter bug allowed hackers to match your account to your phone number.  (Tech Crunch)

    Of course, Twitter has made it a habit to lock existing accounts for no reason and demand your phone number to unlock them, just as Facebook demands photos of yourself.  Otherwise Twitter wouldn't have the phone numbers for hackers to steal in the first place.

  • How not to migrate customer data.  (Increment)

    Given the size of the project - 2500 man-years - and the requirement for a hard cutover, it was pretty much guaranteed to go wrong.  Which it did.

  • A handy Python cheatsheet.

    Print it out and use it as a pillow.

  • Hmm.  WebNX have 2288G and 3800X servers with NVMe SSDs.  ReliableSite also have the 3800X and I already have an account with them, but they don't use ECC RAM in their Ryzen servers and WebNX do.

    Anyway, won't need one for another couple of months yet.

    Even the 3800X will deliver 50% better single-threaded and nearly 3x the multithreaded performance of our current servers.  (CPU Benchmark)

    And that's the new low end.

    The WebNX servers are available with Intel enterprise 1.2TB NVMe SSDs.  I was thinking "Isn't that expensive for a server that only costs $125 per month?"

    Turns out the answer is no, not especially.

    Not sure if that's the model they're using - it's a couple of years old now and no longer in production - but it could well be given that a company like that would be looking for hardware that is reliable and cheap, rather than new and shiny.

    And it delivers 2.6GB per second on reads and 30 Âµs random write latency, so I certainly wouldn't be complaining if that's what I got.

  • YouTube's latest target is cryptocurrency videos.  (Daily Hodl)

    Now certainly a large percentage of cryptocurrency content is nonsense, but no more than for the rest of YouTube.  And purging informational videos en masse as "dangerous or harmful content" is just...  Exactly what we've come to expect from the idiots running YouTube.





  • Speaking of terrible messes Ethereum 2.0 might be arriving early next year.  (CoinDesk)

    It should be faster than current Ethereum, though not as fast as planned.  It probably won't be able to interoperate with current Ethereum.  It won't be as fast as planned.  It will be more expensive to run complex contracts that read data from the blockchain.

    And it won't offer generic atomic transactions, which could well be a disaster.

    Basically, Ethereum 2.0 divides the network into shards - 64 of them - each representing a single atomic distributed database.  If your application is on one shard and you want to interact with an application on another shard, it's time for you to roll your own two-phase commit implementation. 

    In a language that is basically awful, has all sorts of arbitrary limitations, and costs you real money for every instruction you execute.

  • Ruby 2.7 is out.  (Ruby-lang)

    I don't really use Ruby, but I've always liked it and would have felt quite at home if Python hadn't shown up first.

  • Redis 6.0 is at RC1 (release candidate one).  (GitHub)

    They don't expect to release the final version for three months or more; they go through an extended public test cycle with every new release to shake loose every bug they can.

    This release notably adds support for local caching: Your Redis client can cache data itself, directly in memory in the right format for your programming language, and Redis will notify it of cache invalidations.  This can speed up cache lookups by 10x or more at the cost of a little extra RAM; no extra application code required.

    I'm really looking forward to this.  It's not every day someone hands you a free 10x speedup.

  • Merry Christmas everyone!

Video of the Day



Disclaimer: Not particularly looking forward to 2020.

Posted by: Pixy Misa at 10:39 PM | Comments (2) | Add Comment | Trackbacks (Suck)
Post contains 677 words, total size 6 kb.

Tuesday, December 24

Geek

Daily News Stuff 24 December 2019

We Heard You Like Proxies Edition

Tech News



Disclaimer: Not that anyone else seems to know how to fix bugs anymore.

Posted by: Pixy Misa at 11:48 PM | Comments (3) | Add Comment | Trackbacks (Suck)
Post contains 533 words, total size 5 kb.

<< Page 1 of 4 >>
106kb generated in CPU 0.0388, elapsed 0.3085 seconds.
56 queries taking 0.29 seconds, 384 records returned.
Powered by Minx 1.1.6c-pink.
Using http / http://ai.mee.nu / 382