Sunday, May 12

Geek

Why Does Julia Use * For String Concatenation?

I quote:
I think the main reason is that algebras over fields are commutative with respect to + but not necessarily for *. String concatenation is definitely not commutative. I can't remember whether this was part of the original motivation, but I also tend to think of this in the terms that string concatenation is a lot like taking the outer product, you're getting an object with the combined dimensionality of both factors.
This is difficult to argue with, but the answer is still not what I'd prefer.

I was writing a longer post on Julia last night, but the editor ate it.

Julia is an interesting new programming language (it first appeared last year) that attempts to provide the mathematical power and sheer computational efficiency of Fortran while offering programmers the expressiveness, flexibility, and clarity of Ruby. The result is 70% awesome, 10% odd, and 20% not there yet (since it's so new).

Here's a snippet:

cd("data") do
    open("outfile", "w") do f 
        write(f, data)
    end 
end

In these five short lines we already see two important things:
  1. Readable code. Neither curly brackets and semicolons, those codependent blights on programmer productivity, nor bizarre unfamiliar syntax like Smalltalk.

  2. Something like Ruby's blocks or Python's with - the cd function creates a context, as does the open function, and those contexts apply to the enclosed statements and are automatically cleaned up at the end. So the program changes its working directory to the data only for the code within the do...end block, without the programmer needing to worry about any details.
While Julia's syntax closely resembles sane, healthy, modern languages like Ruby and Python, it veers off in some details because it was designed by mathematicians rather than computer scientists. Thus you get the self-consistent if somewhat weird decision to use * for string concatenation.

More significantly, while Julia is object-oriented, it is not class-based (as most object-oriented languages are). It uses multiple dispatch based on the arguments to a function rather than binding functions to an object.

That is, where in Python you might define an image object and a resize method, and call it like this:

im = image("kitty-ears.jpg")
im.resize(x=500)

In Julia you'd define the data structure of the object, and then define a set of functions that act upon that object. There might already be a resize function that acts on arrays or vectors or memory-mapped files (or a hundred other things), but when you call

resize(im; x=500)

Julia knows that you mean the image resize function, because it knows that the variable im is an image. Values cannot change their type, so the Julia compiler can bind to the right version of the function at compile time, unlike Python or Ruby, where dispatch is always dynamic.

And that matters because it means that Julia is about 20x faster than Python.*

What Julia doesn't (currently) provide is multi-threaded programming. It's supports coroutines, called tasks, that allow you to write your code in a logically multi-threaded way. And it supports message-passing multi-processing, so you can spin up multiple instances of your application on different CPUs and easily dispatch tasks to other workers and receive the results when they're done. But you can't have multiple processes sharing a common native data structure.

But then, neither (really) can Python or Ruby. Both support threading, but both have global interpreter locks - the infamous GIL in Python - that means that only one thread is working on native code or data structures at a time. Threads get unlocked when they are doing I/O, and in some C libraries, so you do get a speedup in real-world applications. So if all three languages are effectively still single-threaded, with threads largely a programming convenience, you'd go with the one that's 20x faster, yes?

Yes, except that Python has a huge and wonderful standard library and an even huger and wonderfuller ecosystem of third-party packages.

Except except:
  1. You can embed Python in Julia, with two-way transfer of data and functions, using PyCall. You can just plain import your Python modules into a Julia program and use them:

    @pyimport pylab
    x = linspace(0,2*pi,1000); y = sin(3*x + 4*cos(2*x));
    pylab.plot(x, y; color="red", linewidth=2.0, linestyle="--")
    pylab.show()

    It's almost-but-not-quite Python on the Julia side (see the ; between the positional and named arguments in the pylab.plot call) but calling existing Python code from Julia is almost perfectly transparent.

  2. You can access MongoDB from Julia. MongoDB isn't perfect** but it's the swiss-army chainsaw*** of NoSQL, at least since they fixed it so that it doesn't crash and destroy all your data every time a gnat sneezes.****

  3. You can pretend it's Ruby and slap together web apps like there's no tomorrow using Morsel:

    using Morsel

    app = Morsel.app()

    route(app, GET | POST | PUT, "/") do req, res
        "This is the root"
    end

    get(app, "/about") do req, res
        "This app is running on Morsel"
    end

    start(app, 8000)

  4. There's support for ZeroMQ, the lightweight queueing... Thing.

  5. Also, Curl.

That's important, because those five items cover everything I need for both my day job and my off hour programming. If I can still use all my existing code and write new code that runs 10-20x faster (I use Psyco, the precursor to PyPy, here at mee.nu, which delivers a real-world speedup of very close to 2x, so only 10x there) in a language that doesn't make me want to shoot myself, that makes me a happy bunny.

Full support for multi-threaded programming would make it even better, but since I don't really have that now, it's not a show-stopper. For mee.nu, I run five instances of Minx behind a load-balancing proxy, though we rarely need the performance. Julia provides plenty of ways to use multi-processor machines, just not that particular way.

And if it really delivers 10x the performance in practice, that's like getting 10-way multi-threading with zero software overheads and zero extra hardware.

So a cautious thumbs up so far from me.

* Or about 4x faster than PyPy, the Python JIT compiler. But since you can easily embed Python code in Julia (the PyCall package provides this) and PyPy still has a number of incompatibilities with common Python packages, there's an argument that Julia is a better way to go even for Python programmers.

** Indeed, while it supports atomic updates, it doesn't support transactions across multiple records, so some would argue that it's not a database at all. My definition of a database is that it lets you find what you want in better than linear time even if you don't know what you're looking for - i.e. it provides some sort of secondary index. By that definition, MongoDB is a database. And Redis and RethinkDB and Aerospike aren't. Which doesn't mean they're not useful - Redis is bleedin' wonderful! - it just means they're not databases. They're datathingies.

*** Joke stolen shamelessly.

**** Which to their credit they fixed four major releases ago. These day's it's pretty robust.

Posted by: Pixy Misa at 10:38 PM | Comments (3) | Add Comment | Trackbacks (Suck)
Post contains 1164 words, total size 10 kb.

1

I'm not sure I'd call Julia object-oriented, I'm not even sure I'd call it object-based (but I've really only dipped my toes into the documentation). It seems like someone got tired of Matlab's syntactic and semantic warts and interpretive performance* and decided to clean it up.  It's too bad SISAL never caught on.

* If the Matlab interpreter is where your code is spending most of its time, either you're doing it wrong** or you're using the wrong language...

** Ok, passing array arguments by value can be kind of expensive...

Posted by: Kayle at Monday, May 13 2013 04:24 PM (M7tH0)

2 I'm not sure I'd call Python sane or healthy.  In fact, I'm pretty sure I'd call it a lot of four-letter words.  Which, actually, I do, fairly often, to the annoyance of my co-workers.

I'm nearing the point where I'm going to tell my boss that Python can go stuff itself, I'm retreating to c++.

Posted by: dkallen99 at Tuesday, May 14 2013 02:27 AM (2lHZP)

3 C++?

It does depend on what you need to do, of course.  I have no love for C++, but there are plenty of tasks where it is the right choice and Python ain't.

Posted by: Pixy Misa at Tuesday, May 14 2013 02:44 AM (PiXy!)

Hide Comments | Add Comment

Comments are disabled. Post is locked.
51kb generated in CPU 0.1, elapsed 0.233 seconds.
52 queries taking 0.1981 seconds, 284 records returned.
Powered by Minx 1.1.6c-pink.