Tuesday, February 09
I'd forgotten about this one: mxBeeBase. Open source B+tree Python library.
Perfect!
So that's R-trees, Quadtrees and B+trees done. Within hours of deciding I was going to write my own database I already have all the indexing dealt with.
Oh, and Pita is going to be open source too. BSD license, probably, unless I need to use something GPL.
Posted by: Pixy Misa at
02:31 AM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 63 words, total size 1 kb.
Well, that's annoying. I just wrote one of these to speed up the Minx template parser, but this one is probably faster (it's a Cython module) and certainly better documented.
The functionality is identical as far as I can see, but I called mine a DictTree.
Actually, no, mine has one extra feature: It allows you to reference a value using mapping-style lookup (tags['a.b']) that would be a subtree in an attribute-style lookup. If tags.a.b.c is set, tags.a.b is necessarily the tree containing b; try anything else and you blow yourself out of the water. But tags['a.b'] can be anything you like.
I need to do that because I designed the Minx template language that way. (Oops.) It lets you reference, for example, post.date as a date value, and also post.date.month to find just the month of the date of the post. You can't do that with dicts; you could probably do it with a smarter class, but bang would go my generality.
Since that trick is used on both dates and strings, I'd need to make all my dates and strings into custom classes to make the attribute syntax work directly, and that's just too messy to contemplate.
Posted by: Pixy Misa at
02:09 AM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 200 words, total size 1 kb.
Monday, February 08
Or, The Screw It, I'm Writing A Database Post
Okay, I've had it up to here with databases that suck.*
So I'm writing my own.** It is called, for obvious reasons, Pita.
The plan is to steal as much of the low ground as possible. Anything hard, it just won't do.
First, it won't be written in C. It will be written in Python. Yes, performance will suffer, but Python can actually deliver suprisingly well, and it has some very well-optimised and well-tested libraries available. My Python benchmark script for MongoDB*** was moving as much as 90MB of data per second, which would about saturate a gigabit ethernet link or SATA disk.
The design goals are as follows:
- Doesn't lose your data.
- Doesn't lose your data, ever.
- Doesn't have to go offline for schema changes, including adding or removing indexes.
- Doesn't have a query language. This may end up making complex queries more complex. That's okay, becase it makes simple CRUD operations dead easy.
- Doesn't have any avoidable hard-coded limits that aren't insanely large. Whatever bit-size seems reasonable for a counter, I'll double it (if it's not already an arbitrary precision value).
- Doesn't do random I/Os except for indexes and one (of seven) table types. Well, two, I guess.
- Doesn't do row-level locking.
- Doesn't do multi-table or multi-record transactions, let alone two-phase commit. Look, if you're a bank, you already have the money, go buy DB2 and stop bothering me.
- Doesn't do joins... Well, it sort of does. We'll get to that.
- Doesn't guarantee high performance, full consistency, versioning, and multi-master replication on any single table type. It will have all of those, but you can only choose one from column A and one from column B.
- Doesn't scale to ridiculous sizes. If at any point, though, it looks like it can't at least scale to pretty darn big, I'll drop it.
- Doesn't ever do random access of random-length data to disk. Um, except for indexes, where the random-length data will be divvied up into fixed-length blocks anyway. And for the initial development versions, indexes are likely to be memory-based unless I find some nice (simple, fast, efficient) on-disk indexing libraries.
- Doesn't lose your data. I mean it. Even when it loses your data, it doesn't lose your data.
- Supports multiple table types, each optimised for a different task. So even though there's no one table type that does everything, there should be something good enough for most cases.
Specifically: - Store - a log-structured, versioned, indexed document store. Documents are never deleted or changed. All changes are added as a new version of the document at the end of the table. Everything is stored in JSON or YAML.
Advantages - no random I/Os for data, roll back to any point in time, back up simply by copying the data segments. The entire table is in text, so even if everything goes splatooie, it's easy to write a program to parse it and salvage your data.
Disadvantages - if your records change frequently, it will eat disk space like candy. You can do an online file pack to purge old versions of documents, but that's a chore. Restore from backup would require an index rebuild. - Fixed - a fixed length, indexed, random-access table with change log. Pretty much the diametric opposite of Store.
Advantages - fast access to data because the position in on disk is a function of the record number, and it's all binary so there's no parsing required. Every record is versioned, so by copying the data file plus the change log you can roll forward to a consistent state. You can't necessarily roll back, though.
Disadvantages - fixed length. - Table - a combination of a Store for the documents and a Fixed for numeric fields, dates, booleans and any other fixed-length data.
Advantages - all of the advantages of Store and Fixed together. The default on-disk table type, hence the name. Supports full roll-forward and roll-back recovery.
Disadvantages - the fixed fields are written twice on a new major document version, to both the Store and the Fixed files. On the other hand, changes to the fixed fields only don't require an update of the Store, significantly reducing storage requirements when you have documents with a few numeric fields that change frequently. Similarly, reading a document requires reading two files, and you can't readily get the fixed values for minor versions (where the Store wasn't updated). - Array - a non-versioned, indexed, in-memory document store, with snapshot plus log persistence.
Advantages - should be very very fast, since it's entirely held in memory. Also very flexible for the same reason. Roll-forward recovery, though no roll-back. Backups are still as simple as copying the files on disk.
Disadvantages - if your server crashes, or even if you need to do a normal reboot, you can't use the table until it's reloaded itself from disk, resynced, and rebuilt the indexes. Thus primarily suited for frequently-accessed but relatively small tables. - Cache - a non-versioned, indexed, in-memory document store with a fixed size and LRU policy.
Advantages - it's a memory-based cache with the exact same semantics as your database.
Disadvantages - I'd be surprised if it gets within a factor of 3 of the speed of dedicated caches like memcached. - Queue - a disk- or memory-based document queue, i.e. first-in first-out. Disk-based queues use segmented files and a transaction log for recovery and efficient space reclamation.
Advantages - it's a queue with the same semantics as your database. Well, kind of. I don't know that I'll actually support all the fancy stuff. Does only sequential disk I/O.
Disadvantages - won't have some of the fancy features of something like ActiveMQ. However, probably won't arbitrarily run out of memory and wedge itself. At least if it crashes outright you can restart it. - Stack - a disk- or memory-based document stack, i.e. last-in first-out. All the reads and writes alike happen at the end of the file.
Advantages - it's a stack.
Disadvantages - has to lock the file for every operation to prevent screwups, so won't be super-efficient. - Support multiple data types that (mostly) map closely to Python's own:
- Int
- Char
- Date
- Time
- Float
- String
- Money
- Number
- Geometry
- Point
- Line
- Square
- Rectangle
- Circle
- Ellipse
- Text
- Auto
- Logical
- Encrypted
- Binary
- Support multiple data structures within documents that closely map to Python's own:
- Map
- Set
- Bag
- List
- Array
- Variant
- Reference
- Support multiple index types and modes:
- B-tree/B+-tree (of some sort) for primary keys, unique indexes, and general purpose indexy stuff.
- R-tree or Quadtree for GIS stuff.
- Full-text index, which will probably start out as a hacky B-tree of some sort.
- Indexing of structures (lists, maps etc) within documents.
- Partial indexes.
- Triggers and stored procedures. The embedded Lua interpreter I'm putting into the next version of Minx will do nicely.
- Embedded database. Don't need a full-fledged dataserver? Just run the whole thing within your app. The code will be split into a database library and a dataserver that runs on top.
- Replication - for Store, Queue and Stack, a choice of multi-master replication with eventual consistency or master-slave. For Fixed, Table, and Array, master-slave replication. For Cache, no replication. (It's a cache!)
- Sharding - for Store (probably) and Cache, easy sharding across servers. For other table types (probably), no sharding.
- Uses JSON or YAML everywhere, for data storage, data logs, config files, schema files, APIs and anywhere else a standard format is required.
Advantages - no XML.
Disadvantages - none. - Pure-ish Python. The plan is to write it all in Python, with some optimisations in place for Cython.
You can do that - it's kind of neat. The exact same code can run interpreted with regular Python, JIT-compiled with Psyco, binary-compiled with Cython, on the JVM with Jython, or on .NET with IronPython. And that's the plan; to make it run everywhere, but include optimisations for regular Python on Linux on x86 or x86_64. And avoid those horrifying string concatenations if I can.
One catch I know already - for the Array snapshots, I'm planning to use the Unix fork semantics, which are copy-on-write, i.e. you get a static snapshot of all your data structures at an amortised cost so that you can easily write it back to disk while online. Windows' fork semantics are different and don't let you do that, so snapshots would stall the database, or at least the table. Still, with commodity hard drives achieving peaks of over 100MB/second and modest RAID arrays reaching a few hundred MB/second, even writing a few GB of data to second once a day shouldn't take too long. - Designed to take advantage of SSDs and HDDs. Put the random I/O load on your SSDs and the sequential-update bulk data on your HDDs. Or put everything on SSDs, that works too. I'm not going to bother to try to make random I/O work super-efficiently on HDDs; that's simply a losing game. For small databases just use Array tables and load everything into memory; for larger databases buy yourself a nice Intel X25-E.
Update: Found Python implementations of B+Trees, R*-Trees, and a B-Tree based sorted list and dict module. That'll save some time!
* That is, do not meet my current requirements. Or in some cases, actually spread pneumonic plague. YMMV.
** Maybe.
*** The MongoDB server is C, but the benchmark program is Python, and it ships a whole lot of data back and forth. Which tells me that a Python program can ship a whole lot of data back and forth. The program can create 32,000 1K or 22,000 2K records per second, and read 50,000 1k or 32,000 2K records per second. The 90MB per second was a achieved with 10K records.
Posted by: Pixy Misa at
09:13 PM
| Comments (7)
| Add Comment
| Trackbacks (Suck)
Post contains 1617 words, total size 12 kb.
Sunday, February 07
I was planning to spend the weekend working with MongoDB, but those plans evaporated when it crashed and destroyed my test database. So instead I dug out my toy Python benchmark and ran it on Eineus. And just for fun, did the same in Psyco and Cython and Jython. Results are... Mixed. Yeah, that's a good word, particularly since the IronPython benchmark is still running.
| System | CPU | Clock | Python | Loop | String | Scan | Total |
|---|---|---|---|---|---|---|---|
| Eineus | Phenom II 945 | 3.0GHz | 2.6.4/32 | 0.950 | 1.483 | 0.437 | 2.870 |
| Eineus | Phenom II 945 | 3.0GHz | 2.6.4/Pysco | 0.013 | 0.180 | 0.477 | 0.670 |
| Eineus | Phenom II 945 | 3.0GHz | 2.6.4/Cython | 0.000 | 84.750 | 0.490 | 85.240 |
| Eineus | Phenom II 945 | 3.0GHz | 2.5.1/Jython | 0.682 | 499.936 | 0.758 | 501.376 |
| Nagi | Phenom 9750 | 2.4GHz* | 2.6/IronPython/32 | 0.544 | 3502.652 | 1.541 | 3504.739 |
| Nagi | Phenom 9750 | 2.4GHz* | 2.6/IronPython/64 | 0.916 | 5399.020 | 1.264 | 5401.202 |
| Miyabi | Phenom II 945 | 3.0GHz | 2.6.4/64 | 0.637 | 1.003 | 0.530 | 2.170 |
| Akane | Opteron | 2.0GHz | 2.5 | 1.887 | 2.733 | 0.880 | 5.500 |
* Normalised to 3.0GHz for ease of comparison.
I'll paste in the IronPython results if it ever finishes. (Update: Done now.)
Some notes:
64-bit Python is now a good bit faster than 32-bit for many cases. It's actually a bit slower in string scanning; I don't know why.
A 3GHz Phenom II running Python 2.6 is 2x faster than a 2GHz Opteron running Python 2.5 from 3 years ago. Someone's been doing some good work, either the Python people or AMD or the Gnu compiler team.
CPython (the standard Python) has some really neat string optimisations that I depend on in Minx. These flow through nicely to Psyco, but are conspicuously absent from Cython, Jython, and IronPython, which are 60, 400, and a couple of thousand times slower for heavy string concatenation (as I said, that benchmark hasn't finished yet...) It's certainly possible to avoid that idiom, and instead, for example, create a list of substrings and then join them all in one operation.
Apart from that, Jython seems to perform fairly well; certainly, if you need to run heavily multi-threaded Python code and can avoid doing millions of concatenations of large strings, Jython could be a winner. The Python interpreter can only run one thread at a time, though other threads can be handling I/O or library functions. The Jython runtime is fully multithreaded, so if you have a multi-threaded application and more than two CPUs - which I do - then Jython can provide an overall performance boost even if the single thread performance declines somewhat. (And it's actually faster on one of the tests, so depending on your code you might win both ways.)
As for IronPython, well, the string concatenation results are just terrible. Looping is comparable with Python or Jython (but far behind Psyco or Cython), and string scanning is the slowest of the lot, though only by a factor of 2, not 2000. It should be fast enough for most tasks as long as you really avoid concatenating large strings. I wonder what list performance is like - I'll have to add a test for that.
Posted by: Pixy Misa at
11:09 PM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 439 words, total size 4 kb.
Anyone?
Quick recap of databases that suck - or at least, suck for my purposes - and some that I'm still investigating.
SQL
- MySQL Lacks intersection and except, lacks array support, has only so-so full-text indexing, offers either concurrency or full-text indexes and GIS, but not both.
- PostgreSQL Provides arrays, concurrency and full-text indexes and GIS, but a lot of this is lost in a twisty maze of plugins and thoroughly non-standard operators. And the full-text indexing sucks.
- Ingres Ingres is free these days (if you don't need a support contract). It's a good, solid database, but doesn't actually offer anything I can't get from MySQL with InnoDB.
- Firebird Doesn't seem to offer anything more than MySQL or PostgreSQL or Ingres. Which doesn't mean that it's bad, but doesn't really help me either.
- SQL Server Needs Windows, which is worth an automatic 6 demerits, even though I can get enterprise-level Windows and SQL Server products for free. (My company is a Microsoft Bizspark member.) Full-text and GIS, intersect and except are all there, but still no arrays.
- IBM DB2 Costs too much.
- Oracle Costs way too much.
- Progress / OpenEdge Solid database, lovely 4GL, but still, the last time I looked at it (2006) mired in 16-bitness (!) and the 4GL is too slow for anything complicated. Also expensive and has a screwed-up pricing model. Would use it if I could.
NoSQL
- Redis Nice feature set, and looks very useful for small systems, but the current version is strictly memory-based. (It's persistent through snapshots and logging, but the whole database must fit in memory.) The developers are working on this, though. The API could do with a tidy-up too; it has different calls for the same operation on different data structures.
- MongoDB Very nice feature set. It's a document database, but it stores the documents in a JSON-like structure (called BSON) and can nest documents arbitrarily and inspect the fields within a document and build indexes on them. But its memory handling is lousy; while it's not explicitly memory-based, I wouldn't want to run it on anything but a dedicated physical server with more memory than my total database size. I could just throw money at it and put another 24GB of RAM in the server (far cheaper than a commercial RDBMS license) which would last us for a while, but I have serious doubts about its robustness as well.
- CouchDB Written in Erlang, which is always a warning sign. Erlang programmers seem to care about performance and reliability far more than they care about making a product that anyone would want to use. In this case, instead of MongoDB's elegant query-by-example (with extensions) I write map/reduce functions in JavaScript and send them to the server. In what universe is that an improvement on SQL? On the plus side, it apparently has replication. On the minus side, it's an Apache project, and I have yet to meet an Apache project that didn't suck in some major way.
- HBase Looks good if you have billions of very regular rows (which I do at my day job, but not here). Nothing wrong with it, but not a good fit.
- Project Voldemort Pure evil. No, wait. This one came out of LinkedIn. It's one of the recent flock of inherently scalable (automatic sharding and multi-master replication) key/value databases. In their own words, [it] is basically just a big, distributed, persistent, fault-tolerant hash table. That's a very useful thing, but I need defined ordering (multiple defined orderings for the same dataset, in fact) which a hash table can't give me.
- Cassandra This is Facebook's distributed hash table thingy (it's like the old days when every server company designed their own CPU). May have some vague concept of ordering, so I'll take a closer look.
- Jackrabbit It's a Java/XML datastore from the Apache Foundation. Uh-uh. I've used ActiveMQ guys. You can't fool me twice. I'd sooner chew rusty nails.
- Riak Bleah. Another key/value/map/reduce thing. In their own words:
A "map phase" is essentially just a function ("F") and an argument ("A") that is defined as part of a series of phases making up a given map/reduce query. The phase will receive a stream of inputs ("I"), each of which consists of a key identifying a Riak object and an optional additional data element to accompany that object. As each input is received by the phase, a node that already contains the document ("D") corresponding to "I" will run
Clear? Right. Not interested at all.F(D,A)and stream along the results to the next phase. The point here is that your function can be executed over many data items, but instead of collecting all of the data items in one place it will execute wherever the data is already placed. - LightCloud A distributed key-value store from Plurk. On the plus side, it's written in Python, supports Tokyo Tyrant and/or Redis for a back end, and "plurk" is fun to say. On the downside, it seems to be just a key/value database and not all that fast; it doesn't seem to expose the more interesting features of Tokyo Cabinet or Redis. It does at least have some update-in-place operations.
- Berkeley DB An oldie but a goodie. An embedded, transactional database. You can shove pretty much anything into it; it doesn't care. No query language, but does have indexes. One problem is this clause from the license:
Redistributions in any form must be accompanied by information on how to obtain complete source code for the DB software and any accompanying software that uses the DB software. The source code must either be included in the distribution or be available for no more than the cost of distribution plus a nominal fee, and must be freely redistributable under reasonable conditions.
Any code that uses the DB software? I assume they mean direct code embedding/linking, but that's pretty broad. And it's really just a library, albeit a good one; it could serve as the basis for a database server, but it isn't that by itself. - Metakit Metakit is a column-oriented database library, with a very nice, clean Python interface. For example, to display all posts by user 'Pixy Misa', you could simply write:
The problem is, it doesn't scale. I tried using it for the first pass at Minx, about four years ago, and it broke long before it reached our current database size. Like MongoDB, nice semantics, not so great on the implementation.for post in posts.select(user = 'Pixy Misa'):
print post.title, post.date - Tokyo Cabinet / Tokyo Tyrant / Tokyo Dystopia, VertexDB Tokyo Cabinet is a database library similar to Berkeley DB, but licensed under the no-worries LGPL. Tyrant is a "lightweight" database server built on Cabinet, Dystopia a full-text search engine built on Cabinet, and VertexDB a graph database built on Cabinet. I haven't explored these in depth yet because the standard Tokyo Cabinet distribution doesn't include Python libraries (Perl, Ruby, Java and Lua, but no Python?), but there are third-party libraries available.
- Xapian and Omega Xapian is a full-text search library, and Omega a search engine built on Xapian. In fact, Xapian is more than that; it can do range searches on strings, numbers, and dates as well, and can store arbitrary documents. It's quite good for searches, but not really suited to general database work.
Posted by: Pixy Misa at
01:17 AM
| Comments (4)
| Add Comment
| Trackbacks (Suck)
Post contains 1208 words, total size 9 kb.
Saturday, February 06
MongoDB ran out of memory and crashed during benchmarking. I headed off to look for the appropriate parameters to tune its memory consumption, and discovered that there aren't any.
MongoDB uses memory-mapped files for storage - as far as I can tell, it maps them in, and then puts all its structures in them, directly to memory, relying on the operating system to handle paging. On OpenVZ, that approach seems unlikely to work. And without at least a synchronous recovery log, it seems destined to destroy your database sooner or later anyway.
So, nice features, shame about the functionality.
Sigh.
Posted by: Pixy Misa at
02:41 PM
| Comments (2)
| Add Comment
| Trackbacks (Suck)
Post contains 102 words, total size 1 kb.
One of the most intractable problems I have with Minx stems from it's inherent many-to-many structure.
Minx supports many sites.
Each site can have many folders.
Each folder can contain many thread.
Each thread can appear in many folders (even on different sites).
Each thread can have many items (posts, comments, and various other less used thingies).
What this means is that to display the 20 most recent comments on your blog, I have to - at least in theory - perform a four-way join, sort, and select. I actually play some tricks to reduce it to a three-way join on a subset of the data, but once you start to page through the comments the tricks begin to break down. Not enough that it's noticeably slow at present, but enough that it won't scale to really large numbers of users or really large sites.
I call it the grandparent problem. If you're looking for one record - an individual comment - no problem, it's O(ln n). If you're looking for the children of a record - comments on a post, comments by a particular user - no problem, it's O(n + log n). But if you're looking for grandchildren of a record, its O(n log n), and that n no longer bears any relation to the number of records you actually want; you have to do a huge join, then sort the results, then select the handful you actually want.
MongoDB has a set of features that, put together, look like they solve exactly this problem.
First, you can have arrays in your records. So, where I currently create duplicate thread records to place a thread in multiple folders (categories, for example), I can just add the category IDs to the array.
By itself that wouldn't be so useful, were it not for feature two: You can index arrays. So I can create an index on the category array and post time, and simply adding and removing category IDs from that array will make the post show correctly in your folders with no performance hit. In fact, it's more efficient (both in space and time) than the current technique.
So far so good. Now for feature three: Arrays in records can contain not just single data values (like a category ID), but other records. So I can put the posts and comments inside the thread record, and when I fetch a thread, I can fetch the entire thread contents in one go.
Now that wouldn't be so useful either except for feature four: You can build an index on a field in a record in an array in a record in your table.
That is, you can shove all your comments straight into the thread record, and then pick them out 50 at a time for paged comments, or in an entirely different order - say, the last 10 comments posted on your blog, no matter what post they're on.
Magic!
The one thing that seems slightly tricky is that MongoDB is document based. You don't read fields from the database, you read documents. You can store documents one inside another (comments inside threads, for example), and then you can get one or more of those comments without reading the whole thing. But if you have information in the thread record itself, you can only get at it by reading the whole thread, comments and all. For an Ace of Spades post with 1000+ comments, that would burn up all the performance I just gained.
There are some ways around that with a little bit of data duplication and other hackery, though it would be nice if MongoDB let you simply select a subset of fields to be returned. It already has ways of updating individual fields inside a document, so something like that might already be on the way.
Anyway, that's where I'll be this weekend.
Update: MongoEngine provides a rather nice ORM - um, ODM - for Python and MongoDB.
Posted by: Pixy Misa at
02:50 AM
| Comments (8)
| Add Comment
| Trackbacks (Suck)
Post contains 667 words, total size 4 kb.
Ten little virtual servers are we,
Freshly created with OpenVZ,
Ten little servers running FreeBSD CentOS 5.4,
Ten little virtual servers.
Everything is installed from source,
Automated by script of course,
Enough packages to choke a baby horse,
Ten little virtual servers.
Ten little virtual servers swiftly,
Updated to to run MongoDB,
No more MySQL for this Pixy!
Ten little virtual servers,
Ten little virtual servers.
more...
Posted by: Pixy Misa at
01:15 AM
| Comments (1)
| Add Comment
| Trackbacks (Suck)
Post contains 182 words, total size 4 kb.
Wednesday, February 03
I've long been in favour of Australia adopting the American Bill of Rights unaltered.*
How long I will be permitted to state these views publicly is now in question, following this bit of complete insanity from South Australia:
South Australia has become one of the few states in the world to censor the internet.This little abomination of human rights is the work of South Australia Attorney General, Michael Atkinson, whose worst crime against humanity prior to this was his single handed prevention of an adult classification for computer and video games in Australia.
The new law, which came into force on January 6, requires anyone making an online comment about next month's state election to publish their real name and postcode.
It could also apply to election comment made on social networking sites such as Facebook and Twitter.
The law ... also requires media organisations to keep a person's real name and full address on file for six months, and they face fines of $5000 if they do not hand over this information to the Electoral Commissioner.
No computer games can be sold in Australia unless they are classified by the censors - the Office of Film and Literature Classification - and since there is no adult category, that means that no games can be sold unless they fit into the MA15+ category - i.e. suitable for 15-year-olds. Any change to this legislation has to be approved by all the state Attorneys General, and Mr Atkinson is the sole holdout.
The reason he gives for this range from the inane to the dishonest, but so do the OFLC's reasons for blocking games. Fallout 3, for example, was banned in Australia because it includes the use of morphine as a painkiller.**
Read that again. No need to bang your head against your desk; I'll do that for you.
That's hardly the worst or most recent offense of the OFLC, either. Just recently, they decided to criminalise the depiction of adult women with insufficiently large breasts.*** In their defense:
We're all taking this too far, says Australian censorship blog Somebody Think of the Children. While it's true that the law does ban women who "look younger" than 18 from appearing in adult publications and films, images of small breasts alone are not "automatically" considered "illegal." For instance, "it’s highly unlikely that a naked photograph of a 30-, 40- or 50-year-old woman with small breasts" would ever be banned.Images of small breasts alone are not automatically considered illegal. Australian censorship blog Somebody Think of the Children, you should not automatically be considered insane and committed to an asylum.
And all this is on top of Senator Conroy, the worst Minister for Communications in Australia's history - and given some of the prior incumbents, that's saying something - and his ongoing crusade to destroy the Internet in order to save it. Australia already has a secret blacklist of web sites that are illegal to visit; dissemination of the list or linking to those web sites is also illegal and subject to a fine of $11,000 per day (and per Senator Conroy, possible criminal charges).
Currently, this secret censorship has no teeth other than the stifling of discussion, though, because there is no actual filter on Australia's internet connections. The sites are banned, but you can only be fined after the fact.
What Senator Conroy is planning, in the name of protecting the children, of course - he has a regular habit of acccusing his detractors of being in favour of child pornography - is installing mandatory filtering at all of Australia's internet providers that block all requests to the banned list of sites, and extending the list to cover thousands more. And introducing a second, even more extensive filter with an opt-out clause, which filter, under recent controlled trials with a list of 10,000 banned sites, had a false-positive rate of just 3%.****
This filter works on HTTP requests on port 80.
So it does nothing to control the spread of Senator Conroy's chosen enemy via other channels, does nothing to check encrypted connections of any sort, and imposes a secret regime of censorship on the entire country.
It can, as it is currently planned, be trivially bypassed via any encrypted proxy or VPN; we can safely assume that those will be next on Senator Conroy's little list.
We need to get rid of the whole present totalitarian mentality at the next election. No Australian should even think of voting for any candidate supporting such attacks on fundamental human rights. And then we need to adopt the Bill of Rights for our own.
Update:
Attorney-General Michael Atkinson has made a "humiliating" backdown and announced he will retrospectively repeal his law censoring internet comment on the state election.You will move to repeal the law after the election? You assume too much, Mr Atkinson.
After a furious reaction on AdelaideNow to The Advertiser's exclusive report on the new laws, Mr Atkinson at 10pm released this statement: "From the feedback we've received through AdelaideNow, the blogging generation believes that the law supported by all MPs and all political parties is unduly restrictive. I have listened.
"I will immediately after the election move to repeal the law retrospectively."
* Many people here look askance at the Second Amendment, I see no reason we shouldn't adopt it along with the others.
** Fallout 3 was eventually released here; I understand they changed
the name of the drug.
*** You may bang your head against your desk now.
**** If you haven't studied enough statistics to grasp the significance of this, let me explain. Assume there are ten million web sites in the world (there are far more than that). Assume 10,000 of those are blocked intentionally. With that rate of false positives, 300,000 sites will be blocked unintentionally, that is, there will be 30 times as many errors as there are correct answers.
Posted by: Pixy Misa at
12:16 AM
| Comments (10)
| Add Comment
| Trackbacks (Suck)
Post contains 987 words, total size 7 kb.
Tuesday, February 02
Civilizations Wars is like a cross between Phage Wars (population and combat mechanics) and Gemcraft (main map, skill levels and unlockable game modes). In itself, that's no bad thing: Phage Wars has a neat combat system and Gemcraft is a solid tower-defence game with a great campaign and skill structure around it.
But Civilizations Wars doesn't look like Phage Wars or Gemcraft. It looks like this:

Opening credits... In a Flash game?

Awesome opening credits at that.

When three tribes go to war...

Yay turtles!

Replaying the first map, mopping up the last of the Roman-type tribe. You can see a bunch of my Golems at left, being utterly ineffectual.

My Egypt-type armies are on the move!

It smells like... victory!
It's a fun mini-4X game with some really cool artwork. Definitely worth taking a look.
* Well, that's what it sounds like they're saying.
Posted by: Pixy Misa at
12:34 AM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 146 words, total size 2 kb.
Monday, February 01
You search for ingredients, and find: The Right Stuff (2), Potion Base (3), Power Flower (17), Nightshade (9), Apple-y Goodness (3), Glowing Goo (4), Smokeblossom (7), Tasty Twig (17), Malted Pill (5), Twigpile (5), Powerpack (8), Ashen Film (8), Bitter Powder (4), Dayshade (2), Ebony Sand (2), Monochrome Flower (4), Milkshake (5), Stark Moonlight (5)! +110 Stamina
Only two Dayshades?
To explain: This is from Billy vs. Snakeman, that silly ninja game I mentioned a while back. What you see here is the result of an unleashed Geothermal Acclimator, combined with a White Eye Sannin, a Forest Trail, a Zodiac Zoo, a Fruits Basket, a Quiet Glade, and a MegaTrail Mix. (I lucked out in the Arena and Dark Hour was active and I won 25 out of 28 battles, which put me just over the 10,000 Reputation I needed for the MegaTrail Mix.) I think that's about the best one-shot ingredient-collecting combination you can get.
Posted by: Pixy Misa at
09:26 PM
| Comments (2)
| Add Comment
| Trackbacks (Suck)
Post contains 163 words, total size 1 kb.
Sunday, January 31
Ho hum, yet another game download service. I already have Steam for the new stuff, GOG for the old stuff, and Impulse for the sprawling intergalactic battlefleet stuff.
Waitaminute. They have A-Train 8!
But it's $30!
But it's A-Train 8!
But it's yet another download service I need to sign up on!
But it's A-Train 8!
But A-Train 9 is due out in a couple of weeks - in Japan anyway - and it looks freakin' awesome! (Bandwidth warning! Link takes you to a page full of 4-megapixel images of really cool toy cities!)
But it's A-Train... You know, that does look pretty amazing.
Posted by: Pixy Misa at
05:16 AM
| Comments (2)
| Add Comment
| Trackbacks (Suck)
Post contains 105 words, total size 1 kb.
Brought to you by the Wistful Ferrets.
Posted by: Pixy Misa at
03:46 AM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 9 words, total size 1 kb.
Visionaries, or just stating the bleedin' obvious?
Posted by: Pixy Misa at
01:04 AM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 9 words, total size 1 kb.
Wednesday, January 27
The first episode was a bit uncertain, but now three episodes in this one is developing into a true delight.
Think Bottle Fairies meets Potemayo meets Gakuen Alice, and you're pretty close.
Posted by: Pixy Misa at
01:56 AM
| Comments (7)
| Add Comment
| Trackbacks (Suck)
Post contains 34 words, total size 1 kb.
Saturday, January 23
Lots more pudding.
Posted by: Pixy Misa at
01:50 PM
| Comments (1)
| Add Comment
| Trackbacks (Suck)
Post contains 5 words, total size 1 kb.
Thursday, January 21
The Toshiba T115 and T135 are awesome little notebooks. The T115 has an 11-inch screen and a single-core Core 2 CPU; the T135 a 13-inch screen and a dual-core Core 2. Apart from that they're basically identical; 1366 x 768 LED-backlit screen, real keyboard, wireless b/g/n, 2GB to 4GB RAM depending on the model, 250GB or 320GB disk likewise. Not much bigger than a netbook, but a lot more powerful. The T115 is $461 on Amazon, the T135 starts at around $600.
You can't get them in Australia.
What you can get is the T110 and the T130. Which have the same specs, but retail for $999 and $1299 respectively.
The Aussie dollar is currently at 91 US cents. You do the math.
You'd think that with such a huge disparity in pricing, someone would step in and import the US model.... And that's exactly what has happened. The T115 is $699 locally (including sales tax) and the T135 starts at $899 depending on the model. They'd probably be even cheaper except that the importer honours the warranty, including shipping costs to and from the US repair centre.
Pure mercantilism, but it's Toshiba's own fault for leaving the door wide open.
Update: Aha! There's an Athlon Neo X2 version of the T115: dual-core 1.5GHz with Radeon 3200 graphics. I ran Haruhi with Vista with full Aero effects on a 3200 motherboard for a year - albeit with a 2.6GHz CPU - and that's quite a capable combination. Cost is $765. Battery life will probably be noticeably shorter than for the single-core models, but my current notebook's battery is shot and only gets about half an hour on a charge, so it's bound to be an improvement.
Update: Rats, they piddled in my cornflakes. It doesn't have Wireless-N. Only one of the 11" models does, in fact. But then my current notebook doesn't have Wireless-N either.
Update: Huh. When did Australia enter the 21st century?
All goods (except for tobacco products and alcoholic beverages) may be imported duty and tax free if their value is $1,000 or less.
Update: That model is $498 on Amazon. So $699 $765 locally isn't bad is a little steep, but still a lot cheaper than Toshiba Australia, who don't offer that model at all. They list a 6-hour battery life compared to 8 hours for the slower Intel single-core with its slower Intel graphics; I think that's a reasonable trade-off.
Posted by: Pixy Misa at
01:29 AM
| Comments (9)
| Add Comment
| Trackbacks (Suck)
Post contains 408 words, total size 3 kb.
Sunday, January 17
Posted by: Pixy Misa at
06:31 PM
| No Comments
| Add Comment
| Trackbacks (Suck)
Post contains 5 words, total size 1 kb.
Saturday, January 02
Today's offers on Steam: Mass Effect for $5 (excellent game, though it suffers from the usual Bioware Empty Universe Syndrome), and all 11 games in the recent Sam and Max series for $15.
I don't really need another copy of Mass Effect, but I'd been thinking of picking up the Sam and Max games, so I did.
Posted by: Pixy Misa at
03:44 AM
| Comments (11)
| Add Comment
| Trackbacks (Suck)
Post contains 58 words, total size 1 kb.
Friday, January 01
The only thing more ignominious than injuring oneself in an attempt to squish a bug is injuring oneself in a failed attempt to squish a bug.
Fortunately I got the little blighter, and so was spared the added insult of him taunting me from a safe distance while I glued my thumbnail back on.*
* Actually the damage isn't that severe, but it hurt like the dickens for the first thirty seconds.
Posted by: Pixy Misa at
10:43 PM
| Comments (7)
| Add Comment
| Trackbacks (Suck)
Post contains 75 words, total size 1 kb.
I'm spending way too much money on Steam this holiday season, but since I didn't buy myself anything else for Christmas (apart from some triple-cream King Island Brie, and that was on special... oh, and Ponyo, but that's a medical expense* and was also on special), and since the total amount is equal to maybe three new-release games at retail price, it's not a real problem.
Though I have now messed up four times and bought a game and then followed up by buying a pack that includes that game. Which has cost me, let's see, $16.50. (Actually one of those was buying a pack that was included in a larger pack...)
But then, to pick one example, for the cost of one new release game (US$75, where a new game typically costs A$80-100), I'm getting 9 games and expansion packs that I actually want, and a further 10 that could be interesting and generally received good reviews. The most expensive single game so far has cost me $13.59; the cheapest $1.99. (Well, $1.49, but that was one of the ones I ended up buying twice.)
When I will actually find time to play any of these is a different matter; I hardly have time right now for Bubble Tanks Tower Defense or Billy vs. Snakeman. But I now have tactical and/or strategic wargames covering every epoch from ancient Greece and Rome through Medieval, Renaissance and Enlightenment Europe to WWII and WWIII (had the Cold War run hot), plus a whole slew of real-time and turn-based fantasy and science-fiction wargames, for a near limitless supply of virtual baddies to virtually dissassemble.**
No Crusades or Mongol Hordes, and no Civil War or WWI, though they seem to be the only major gaps post-Bronze-Age.
Mostly RTS or tactical, but also a scattering of RPG, a couple of FPS, and a selection of indie/casual games.
Now I just need to download them all. Never mind playing them, it's going to take a week just to download them.
Update: Added them all up. 237GB.
* Ghibli films are cheap, keep me sane, and have few harmful side-effects. I really should be able to claim them on my insurance...
** What was the last major wargame I played? C&C Generals maybe? In that game you could play as America, China, or as the filthy terrorists. I played mostly as America, occasionally as China. I don't really see much appeal in playing as the terrorists, or as Nazi Germany, as at least three of my new purchases allow. I want to squish the bad guys.
Posted by: Pixy Misa at
05:34 PM
| Comments (14)
| Add Comment
| Trackbacks (Suck)
Post contains 431 words, total size 3 kb.
Sunday, December 27
Little Warden Pixy, running through the Deep Roads,
Picking up the darkspawn and bopping them on the head!
When down came the demon godmother...
Posted by: Pixy Misa at
06:50 PM
| Comments (9)
| Add Comment
| Trackbacks (Suck)
Post contains 27 words, total size 1 kb.
Went out shopping today, found zero copies of Ponyo on the shelves.* There was a gap in the P's at JB Hifi, though.
I did find volumes 5-8 of the Midori no Hibi manga, so I snaffled those. (I already had 1-4, then it went out of print. Looks like it's been reprinted, yay!)
And picked up a few things on the Steam sale:
Some of those (Space Trader in particular) didn't get great reviews, and others (Prince of Persia, Beyond Good and Evil) are a few years old, but when you are paying as little as $1.49 a game, it can't hurt to take a few chances.
Braid2.49 USD World of Goo4.99 USD Indigo Prophecy3.39 USD Audiosurf2.50 USD Rome: Total War Gold2.49 USD Beyond Good and Evil4.99 USD Prince of Persia: Sands of Time4.99 USD Subtotal25.84 USD
Lumines Base + Advance Pack2.99 USD Evil Genius1.99 USD Assassin's Creed9.99 USD STALKER: Shadow of Chernobyl (AU)1.49 USD Space Trader1.99 USD Heroes of Might and Magic 54.99 USD Subtotal23.44 USD
Update: Poop. Wasted $5 buying the Heroes of Might and Magic 5 pack just now, instead of just getting the expansions as I'd intended. I realised about half a second after I clicked Confirm. I will now go and kill something virtual to make myself feel better.
* Maybe because it's not due out until Tuesday...
Posted by: Pixy Misa at
04:57 PM
| Comments (4)
| Add Comment
| Trackbacks (Suck)
Post contains 226 words, total size 7 kb.
Thursday, December 24
Why is Ponyo being released on DVD the week after Christmas?!
Posted by: Pixy Misa at
02:51 PM
| Comments (10)
| Add Comment
| Trackbacks (Suck)
Post contains 13 words, total size 1 kb.
Wednesday, December 23
I was looking at server pricing to see what it would cost to buy some sizeable web/database servers outright. Answer is: Not that much. A Dell dual quad-core Nehalem server with 72GB of RAM runs about $5600. But the drive prices!
1TB 7.2K RPM SATA 3.5" Hot Plug Hard Drive [$579]That's some markup you've got there, Mr Dell. A 1TB Seagate desktop drive costs $90; the server version (which is the same hardware, but different firmware) costs $160. Dell's storage prices are pretty much a deal breaker. But at some point I ended up on Dell's Australian site, where the exact same server gives me this option:
1TB 7.2K RPM SATA 3.5 " Hot Plug Hard Drive [$311.30]Wait, what? I know that the Aussie dollar has strengthened against the US dollar of late, but not that much.
Posted by: Pixy Misa at
02:30 PM
| Comments (4)
| Add Comment
| Trackbacks (Suck)
Post contains 139 words, total size 1 kb.
Powered by Minx 1.1.2-pink.








