Friday, September 02
Couching
I spent the last couple of weeks dead in a ditch (I spend entirely too much time dead in ditches) but this evening I crawled out and picked up my CouchDB library again and worked out which was the right version and renamed it and put it into Mercurial and all that good stuff.
And went back to the docs and worked out why my
Slowly.
I haven't defined my own views yet (that's next) so I'm using the default _all_docs view, which allows you to fetch multiple records from a key range or an arbitrary set of keys with a single request.
But while I can write data at around 6000 records per second (single-threaded, using 100-record inserts), reads are significantly slower, only around 1500 per second (using 100-record queries). Multi-threading got me to twice that, but no higher - though that may be my test environment. Whatever, 1500 records per second for reads is slow. 6000 records per second for inserts is fine, but that read performance is just lousy. I stopped work on Pita because it was unacceptably slow - but it was a damn sight faster than that.
I don't currently know exactly why it's so slow, but I can see that it's 98% CouchDB and only 2% my Python library. Either the default view is intrinsically slow, or CouchDB itself is. This isn't a latency issue; CouchDB is running at close to 100% CPU here - more when I run the benchmark multi-threaded.
Or I'm doing something seriously dumb. Except that performance is the same whether I test with 2000 records or 50,000, so it's not clear what that dumb thing might be.
Update: If I don't set include_docs, it runs twice as fast. That's not good. If it had run twenty times faster, then I'd know not to use include_docs that way; I'd do something different. If it had run twenty percent faster, I'd have concluded that the problem was something I was doing wrong. But twice as fast suggests that the problem is inherently with CouchDB's performance. Not good at all.
Update: So it's slow, but does it scale?
Just by chance, I happen to have a 40-processor Intel E7 Xeon system sitting here doing not very much (well, it's doing real-time social network influence analysis at a rate of 5000 records per second, but for such a large server, that's not much). Since I'd very much like to get CouchDB working for that project too, it was logical to fire up a test.
Results: Around 90,000 record inserts per second, around 14,000 record reads by key list (multiget), around 30,000 record reads by key range (rangeget). That's with a single CouchDB instance and 40 different databases, which was the easiest thing to test quickly. CouchDB databases are pretty much the equivalent of tables in MySQL (not exactly; they're a lot more flexible) so having 40 of them is not at all unreasonable.
This is only a very small, quick test; it's no indication of how CouchDB would scale for large datasets, but it can certainly scale to multiple CPUs - CouchDB was using about 20 cores during the benchmark.
Writes scaled from 5000 records per second (in batches of 100) with one client, to 90,000 per second with 40 clients. That's not bad at all.
Reads scaled from... Oops, I've lost those numbers, back in a bit.
...
From 950 to 14,000 multiget, 1500 to 30,000 rangeget.
So yeah, it scales. Not quite linearly, but quite well.
Now, if I can just get that single-threaded performance up a bit, I'll be happy again. Otherwise I'll need to scrap the plans for the E3 Xeons and go back to waiting for Bulldot.
Comments are disabled.
Post is locked.
I spent the last couple of weeks dead in a ditch (I spend entirely too much time dead in ditches) but this evening I crawled out and picked up my CouchDB library again and worked out which was the right version and renamed it and put it into Mercurial and all that good stuff.
And went back to the docs and worked out why my
multiget
method didn't work, and duh it was because I was using the wrong parameter, and I fixed it and it worked and I added a rangeget
method and it worked too.Slowly.
I haven't defined my own views yet (that's next) so I'm using the default _all_docs view, which allows you to fetch multiple records from a key range or an arbitrary set of keys with a single request.
But while I can write data at around 6000 records per second (single-threaded, using 100-record inserts), reads are significantly slower, only around 1500 per second (using 100-record queries). Multi-threading got me to twice that, but no higher - though that may be my test environment. Whatever, 1500 records per second for reads is slow. 6000 records per second for inserts is fine, but that read performance is just lousy. I stopped work on Pita because it was unacceptably slow - but it was a damn sight faster than that.
I don't currently know exactly why it's so slow, but I can see that it's 98% CouchDB and only 2% my Python library. Either the default view is intrinsically slow, or CouchDB itself is. This isn't a latency issue; CouchDB is running at close to 100% CPU here - more when I run the benchmark multi-threaded.
Or I'm doing something seriously dumb. Except that performance is the same whether I test with 2000 records or 50,000, so it's not clear what that dumb thing might be.
Update: If I don't set include_docs, it runs twice as fast. That's not good. If it had run twenty times faster, then I'd know not to use include_docs that way; I'd do something different. If it had run twenty percent faster, I'd have concluded that the problem was something I was doing wrong. But twice as fast suggests that the problem is inherently with CouchDB's performance. Not good at all.
Update: So it's slow, but does it scale?
Just by chance, I happen to have a 40-processor Intel E7 Xeon system sitting here doing not very much (well, it's doing real-time social network influence analysis at a rate of 5000 records per second, but for such a large server, that's not much). Since I'd very much like to get CouchDB working for that project too, it was logical to fire up a test.
Results: Around 90,000 record inserts per second, around 14,000 record reads by key list (multiget), around 30,000 record reads by key range (rangeget). That's with a single CouchDB instance and 40 different databases, which was the easiest thing to test quickly. CouchDB databases are pretty much the equivalent of tables in MySQL (not exactly; they're a lot more flexible) so having 40 of them is not at all unreasonable.
This is only a very small, quick test; it's no indication of how CouchDB would scale for large datasets, but it can certainly scale to multiple CPUs - CouchDB was using about 20 cores during the benchmark.
Writes scaled from 5000 records per second (in batches of 100) with one client, to 90,000 per second with 40 clients. That's not bad at all.
Reads scaled from... Oops, I've lost those numbers, back in a bit.
...
From 950 to 14,000 multiget, 1500 to 30,000 rangeget.
So yeah, it scales. Not quite linearly, but quite well.
Now, if I can just get that single-threaded performance up a bit, I'll be happy again. Otherwise I'll need to scrap the plans for the E3 Xeons and go back to waiting for Bulldot.
Posted by: Pixy Misa at
10:47 PM
| Comments (3)
| Add Comment
| Trackbacks (Suck)
Post contains 652 words, total size 4 kb.
1
Bulldot?
Posted by: Phil Fraering at Friday, September 09 2011 10:48 AM (rVfMa)
Posted by: Pixy Misa at Friday, September 09 2011 12:49 PM (PiXy!)
3
I thought Godozer's invasion of Manhattan was thwarted back in the mid 80's.
Posted by: Phil Fraering at Wednesday, September 14 2011 10:35 PM (rVfMa)
49kb generated in CPU 0.045, elapsed 0.1389 seconds.
56 queries taking 0.1313 seconds, 349 records returned.
Powered by Minx 1.1.6c-pink.
56 queries taking 0.1313 seconds, 349 records returned.
Powered by Minx 1.1.6c-pink.