Saturday, February 13

Geek

Syntactistaticality

Now where the heck was I, before being buried under an avalanche of poorly-considered Atom feeds and Chinese replica watch spam?

Ah, right.

We can't do a full Progress-style where clause in Python, unfortunately.  Or not without more trickery than I intend to apply; someone did make a working goto - never mind that, a working comefrom - but I'm not inclined to go to that sort of length.

So.  I want the first 20 posts in a given folder of a given blog, sorted by date order (descending, of course).  Pythonically.  No SQL.  Let's see:
db = Pita.Connect(host, user, pass, database)
posts = db.views.posts
I've connected to the Pita server and have a view open.  Now:
for post in posts(folder=f,order='date-',limit=20):
    ...
That's not bad.  With Python's named parameters, you can use any field in a flat record structure, so:
authors = db.views.authors
a = authors.find(name='Pixy Misa')
for post in posts(author=a.id, tag='databases',order='score-',limit=20):
    ...
If we have a nested structure, though, it doesn't work.  Python doesn't let you say:
for post in posts(author.name = 'Pixy Misa')
even if we have the code to automatically resolve the relation.  It's not valid syntax.  So that's one place where it breaks down.  Another place is ranges; we can say
for author in authors(country = 'Australia')
to get a list of authors who live in Australia, but we can't say
for author in authors('Andorra' < country < 'Azerbaijan')
even though that is a valid Python expression.  It will get evaluated, and we'll just pass either True or False to authors() (or throw an exception), and it just won't work.

Now, the design of Pita is that it's primarly a document database with advisory schemas.  It's not schemaless like many or the key-value stores, and it's not fixed-schema like most traditional relational databases.  Each view has a schema, which specifies what fields should be there, and if they are, what type they should be.  Fields can be missing, in which case the schema may specify a default value.  And you can stick in whatever additional data you want, so long as the schema doesn't specifically conflict with that.

What this means is that we can know that country is a string, and if we do an equality comparison between a string and a list, we mean that we want to know if the string is in the list.  So we can also do this:
for author in authors(country = ['Australia', 'New Zealand', 'Canada'])
to get authors from any of those countries.

By returning a generator or iterator, we can efficiently replace this:
for post in posts(blog=b,tag='databases',order='score-',limit=20)
with the more Pythonic
for post in posts(blog=b,tag='databases',order='score-')[:20]
Slicing (as it is called) is very general in Python and very useful, so adapting it to database selects will come naturally.

But what about range searches?  There's no obvious Pythonic syntax for this, at least, not one that works.  Here are a few possiblities:
for house in houses(price = '<100000'):
    ...
for house in houses(price = ['>50000','<100000']):
    ...
We know price is of type money, so we look at that string, and the leading < means it's a range match.  Goody!  Doesn't work so well - or at all, for that matter - for strings, because we could be perfectly well looking for those exact strings. 

We could have an explicit range function:
for house in houses.price.range(50000,100000):
    ...
That's not too bad either; it's pretty clear syntactically and semantically, and it requires no parsing.  Doesn't let you differentiate between > and >= though - and you can't do a range match on more than one field.  (You can't effectively use a binary tree for such a search anyway.)  We can still slide in our other parameters like so:
for house in houses.price.range(50000,100000,suburb='Wondabyne'):
    ...
But (again due to the strictures of Python), they must come last.

Since we're building a generator, it should be possible to do this LINQish trick:
for house in houses(suburb='Wondabyne').price.range(50000,100000):
    ...
The first operation produces a view that knows to search on the suburb field for Wondabyne.  This derived view has the exact same attributes of the original view, and price is one of those attributes, and we can use the range selector on price just like before.

We should be able to keep doing that sort of thing, until we get something like:
for house in houses.suburb('Wondabyne').bedrooms.range(3,5).bathrooms.min(2).price.max(150000).order('price+'):
...
But it's not terribly dynamic.  So, next stop, dynamism.

Posted by: Pixy Misa at 03:13 AM | No Comments | Add Comment | Trackbacks (Suck)
Post contains 701 words, total size 5 kb.

Comments are disabled. Post is locked.
49kb generated in CPU 0.0136, elapsed 0.1447 seconds.
54 queries taking 0.1357 seconds, 335 records returned.
Powered by Minx 1.1.6c-pink.