Sunday, April 26
Needs For Speeds
Testing various libraries and patterns on Python 2.7.9 and PyPy 2.5.1
Testing various libraries and patterns on Python 2.7.9 and PyPy 2.5.1
Test | Python | PyPy | Gain |
Loop | 0.27 | 0.017 | 1488% |
Strlist | 0.217 | 0.056 | 288% |
Scan | 0.293 | 0.003 | 9667% |
Lambda | 0.093 | 0.002 | 4550% |
Pystache | 0.213 | 0.047 | 353% |
Markdown | 0.05 | 0.082 | -39% |
ToJSON | 0.03 | 0.028 | 7% |
FromJSON | 0.047 | 0.028 | 68% |
ToMsgPack | 0.023 | 0.012 | 92% |
FromMsgPack | 0.02 | 0.013 | 54% |
ToSnappy | 0.027 | 0.032 | -16% |
FromSnappy | 0.027 | 0.024 | 13% |
ToBunch | 0.18 | 0.016 | 1025% |
FromBunch | 0.187 | 0.016 | 1069% |
CacheSet | 0.067 | 0.046 | 46% |
CacheGet | 0.037 | 0.069 | -46% |
CacheMiss | 0.017 | 0.015 | 13% |
CacheFast | 0.09 | 0.067 | 34% |
CachePack | 0.527 | 0.162 | 225% |
PixyMarks | 13.16 | 40.60 | 209% |
Notes
- The benchmark script runs all the tests once to warm things up, then runs them three times and takes the mean. The PixyMark score is simply the inverse of the geometric mean of the scores. This matters for PyPy, because it takes some time for the JIT compiler to engage.
Tests were run on a virtual machine on what I believe to be a Xeon E3 1230, though it might be a 1225 v2 or v3.
- The Python Markdown library is very slow. The best alternative appears to be Hoep, which is a wrapper for the Hoedown library, which is a fork of the Sundown library, which is a fork of the unfortunately named Upskirt library. (The author of which is not a native English speaker, and probably had not previously run into the SJW crowd.)
Hoep is slower for some reason in PyPy than CPython, but still plenty fast.
- cPickle is an order of magnitude slower than a good JSON or MsgPack codec.
- The built-in JSON module in CPython is the slowest Python JSON codec. The built-in JSON module in PyPy appears to be the fastest. For CPython I used uJSON, which seems to be the best option if you're not using PyPy.
- CPython is very good at appending to strings. PyPy, IronPython (Python for .Net) and Jython (Python for Java) are uniformly terrible at this. This is due to a clever memory allocation optimisation that is tied closely to CPython's garbage collection mechanism, and isn't available in the other implementations.
I removed the test from my benchmark because for large strings it's so slow that it overwhelms everything else. Instead, append to a list and join it when you're done, or something along those lines.
- I generally see about a 6x speedup from PyPy. In these benchmarks I've been focusing on getting the best possible speed for various functions, using C libraries wherever possible. A C library called from Python runs at exactly the same speed as a C library called from PyPy, so this has inherently reduced the relative benefits of PyPy. PyPy is still about 3x faster, though; in other words, migrating to PyPy effectively turns a five-year-old mid-range CPU into 8GHz next-gen unobtainium.
- If you are very careful about selecting your libraries. There's an alternate Snappy compression library available. It's about the same speed under CPython, but 30x slower under PyPy due to inefficiencies in PyPy's CTypes binding.
- uWSGI is pretty neat. The cache tests are run using uWSGI's cache2 module; it's the fastest caching mechanism I've seen for Python so far. Faster than the native caching decorators I've tested - and it's shared across multiple processes. (It can also be shared across multiple servers, but that is certain to be slower, unless you have some seriously fancy networking hardware.)
One note, though: The uWSGI cache2 Python API is not binary-safe. You need to JSON-encode or Base64 or something along those lines.
- The Bleach package - a handy HTML sanitiser - is so slow that it's useless for web output - you have to sanitise on input, which means that you either lose the original text or have to store both. Unless, that is, you have a caching mechanism with a sub-microsecond latency.
- The Bunch package on the other hand - which lets you use object notation on Python dictionaries, so you can say customer.address rather than customer['address'] - is really fast. I've been using it a lot recently and knew it was fast, but 1.6us to wrap a 30-element dictionary under PyPy is a pretty solid result.
- As an aside, if you can retrieve, uncompress, unpack, and wrap a record with 30 fields in 8us, it's worth thinking about caching database records. Except then you have to worry about cache invalidation. Except - if you're using MongoDB, you can tail the oplog to automatically invalidate cached records. And if you're using uWSGI, you can trivially fork that off as a worker process.
Which means that if you have, say, a blogging platform with a template engine that frequently needs to look up related records (like the author or category for a post) this becomes easy, fast, and almost perfectly consistent.
import base64
import time
import bunch
import hoep
import msgpack
import pystache
import snappy
try:
import ujson as json
except:
import json
try:
import uwsgi
except:
uwsgi = None
def getdata(M=30):
data = {}
for i in range(M):
data[ i] = '.' * i
return data
def loop(N=10000000, M=1):
d=0
for i in xrange(N):
d+=1
return N, M
def strlist(N=10000, M=100):
for i in xrange(N):
e=[]
for j in xrange(M):
e.append('.' * 1000)
e = ''.join(e)
return N, M
def scan(N=1000, M=100):
d=0
for i in xrange(N):
e='.' * 10000
for j in xrange(M):
d+=1
f=e.find(',')
return N, M
def lambs(N=1000, M=1000):
for i in xrange(N):
f = lambda i: i*i
for j in xrange(M):
d = f(i*j)
return N, M
def stache(N=100, M=100):
data = {'items': [{'a': 123, 'b': 456, 'c': 789, 'd': 0}] * M}
template = '{{#items}}{{a}}{{b}}{{c}}{{d}}{{/items}}'
for i in xrange(N):
text = pystache.render(template, data)
return N, M
def markd(N=10000, M=10):
text = u'widgie widgie *widgie* **widgie** widgie ' * M
for i in xrange(N):
html = hoep.render(text)
return N, 1
def cacheset(N=100000, M=500):
x = '.' * M
for i in range(N):
uwsgi.cache_update('%s:%s' % (i, M), x)
assert uwsgi.cache_get('%s:%s' % (i, M)) == x
return N, 1
def cacheget(N=100000, M=500):
x = '.' * M
uwsgi.cache_update('cachehit', x)
for i in range(N):
y = uwsgi.cache_get('cachehit')
assert y == x
return N, 1
def cachemiss(N=100000, M=500):
for i in range(N):
y = uwsgi.cache_get('cachemiss')
assert y is None
return N, 1
def cachepack(N=10000, M=30):
data = bunch.bunchify(getdata(M))
for i in range(N):
x = base64.b64encode(snappy.compress(msgpack.dumps(bunch.bunchify(data))))
uwsgi.cache_update('%s:%s' % (i, M), x)
y = uwsgi.cache_get('%s:%s' % (i, M))
cdata = bunch.bunchify(msgpack.loads(snappy.decompress(base64.b64decode(y))))
assert len(cdata) == len(data)
return N, 1
def cachefast(N=10000, M=30):
data = getdata(M)
for i in range(N):
x = json.dumps(data)
uwsgi.cache_update('%s:%s' % (i, M), x)
y = uwsgi.cache_get('%s:%s' % (i, M))
cdata = json.loads(y)
assert len(cdata) == len(data)
return N, 1
def tojson(N=10000, M=30):
data = getdata(M)
for i in range(N):
x = json.dumps(data)
assert len(json.loads(x)) == len(data)
return N, 1
def fromjson(N=10000, M=30):
data = getdata(M)
jdata = json.dumps(data)
for i in range(N):
x = json.loads(jdata)
assert len(x) == len(data)
return N, 1
def tomsg(N=10000, M=30):
data = getdata(M)
for i in range(N):
x = msgpack.dumps(data)
assert len(msgpack.loads(x)) == len(data)
return N, 1
def frommsg(N=10000, M=30):
data = getdata(M)
jdata = msgpack.dumps(data)
for i in range(N):
x = msgpack.loads(jdata)
assert len(x) == len(data)
return N, 1
# For Snappy, we want python-snappy with the CFFI interface
# yum install snappy-devel first
def tosnappy(N=10000, M=30):
data = getdata(M)
jdata = msgpack.dumps(data)
for i in range(N):
cdata = snappy.compress(jdata)
assert len(snappy.decompress(cdata)) == len(jdata)
return N, 1
def fromsnappy(N=10000, M=30):
data = getdata(M)
jdata = msgpack.dumps(data)
cdata = snappy.compress(jdata)
for i in range(N):
udata = snappy.decompress(cdata)
assert udata == jdata
return N, 1
def tobunch(N=10000, M=30):
data = getdata(M)
for i in range(N):
b = bunch.bunchify(data)
assert len(b) == len(data)
return N, 1
def frombunch(N=10000, M=30):
data = getdata(M)
b = bunch.bunchify(data)
for i in range(N):
d = bunch.unbunchify(b)
assert len(d) == len(b)
return N, 1
def run(p, label, loops=10):
t0 = time.clock()
for i in xrange(loops):
m, n = p()
t = (time.clock()-t0)/loops
pp(t, label, m, n)
return t
def pp(t, label, m, n, quiet=False):
if m and n:
print '%s: %3.3f (%0.1fus/loop)' % (label, t, t * 1000000.0 / (m*n))
else:
print '%s: %3.3f' % (label, t)
def runall(loops=3):
text = 'Running tests %s times' % loops
print(text)
print('=' * len(text))
t=0
t+=run(loop,'Loop', loops)
t+=run(strlist,'Strlist', loops)
t+=run(scan,'Scan', loops)
t+=run(lambs,'Lambdas', loops)
t+=run(stache,'Pystache', loops)
t+=run(markd,'Markdown', loops)
t+=run(tojson,'ToJSON', loops)
t+=run(fromjson,'FromJSON', loops)
t+=run(tomsg,'ToMessagePack', loops)
t+=run(frommsg,'FromMessagePack', loops)
t+=run(tosnappy,'ToSnappy', loops)
t+=run(fromsnappy,'FromSnappy', loops)
t+=run(tobunch,'ToBunch', loops)
t+=run(frombunch,'FromBunch', loops)
if uwsgi:
t+=run(cacheset,'CacheSet', loops)
t+=run(cacheget,'CacheGet', loops)
t+=run(cachemiss,'CacheMiss', loops)
t+=run(cachefast,'CacheFast', loops)
t+=run(cachepack,'CachePack', loops)
pp(t,'Total', 0, 0)
print
runall(1)
runall(3)
Posted by: Pixy Misa at
01:28 PM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 1403 words, total size 15 kb.
61kb generated in CPU 0.0246, elapsed 0.127 seconds.
56 queries taking 0.1175 seconds, 347 records returned.
Powered by Minx 1.1.6c-pink.
56 queries taking 0.1175 seconds, 347 records returned.
Powered by Minx 1.1.6c-pink.