Thursday, January 31, 2008

My take on Arc

The nerdosphere is abuzz about Arc, Paul Graham's long-awaited new dialect of Lisp, so I figured I might as well pile on. For the benefit of those of you who read this blog for its political content, beware: this is going to be a seriously geeky post. You have been warned.

Some quick background: I am as big a fan of Lisp as you could ever hope to find. I've been using Lisp since 1979 (my first Lisp was P-Lisp on an Apple ][ ) and I used it almost exclusively for over twenty years until I lost my faith and switched, reluctantly, to Python (and I was not alone). Recently I have taken up Lisp again since the release of Clozure Common Lisp. I am proud of the fact that my login on Reddit, YC News and Arclanguage.org is Lisper.

Furthermore, I am a huge Paul Graham fan. I think his essays are brilliant. I think Y Combinator is brilliant (to the point where I'm seriously considering moving from LA to the Silicon Valley just so I can go hang out there). Paul is the kind of guy I wish I could be but can't.

And to round out the preliminaries and disclaimers, I am mindful of the fact that the current release of Arc is a first draft, and it's never possible to live up to the hype.

I think all the enthusiasm and buzz about Arc is wonderful, but I am concerned what will happen if people start to think that there's no there there. If Arc doesn't save Lisp it's hard to imagine what would. And unfortunately, I think Arc has some quite serious problems.

The biggest problem with Arc is that it is at the moment not much more than a (very) thin layer on top of Scheme. Now, that would be OK if it were the right thin layer, but I don't think it is. Arc's design path is well-trod, and the pitfalls that lie upon it are mostly well known.

For example, Arc is 1) a Lisp-1 that 2) uses unhygienic macros and 3) does not have a module system. This is bound to lead to problems when programs get big, and not because people forget to put gensyms (which Arc dubs "uniqs") in the right place (although I predict that will be a problem too). The problem is that in a Lisp-1, local variable bindings can shadow global function names, and so if you use a macro M that references a global function F in a context where F is shadowed then M will fail. If you're lucky you'll get an error. If you're not lucky your program will just do some random weird thing. Hygienic macros were not invented just because some intellectuals in an ivory tower wanted to engage in some mathematical masturbation. This is a real problem, and the larger your code base the more real it becomes.

I cite this problem first because macros are, according to Paul Graham, the raison d'etre for Lisp. Macros are the reason for putting up with all those irritating parentheses. And macros in Arc are broken.

Unfortunately, it gets worse.

Arc is supposed to be a language for "exploratory programming" so it's supposed to save you from premature commitments. From the Arc tutorial:


Lists are useful in exploratory programming because they're so flexible. You don't have to commit in advance to exactly what a list represents. For example, you can use a list of two numbers to represent a point on a plane. Some would think it more proper to define a point object with two fields, x and y. But if you use lists to represent points, then when you expand your program to deal with n dimensions, all you have to do is make the new code default to zero for missing coordinates, and any remaining planar code will continue to work.


There are a number of problems with this. First, the kind of flexibility that Paul describes here is not unique to lists. You could accomplish the exact same thing with, for example, a Python object with named slots (which is just a thin wrapper for an abstract associative map -- note that I did not say hash table here. More on this later.) You could have 2-D points with X and Y slots, and 3-D points with X, Y and Z slots. You could even do the 2-D points-return-zero-for-their-nonexistent-Z-slot trick by redefining the __getattr__ method for the 2D point class. Python objects are every bit as flexible as lists except that it's a lot easier to figure out that a <2-D point instance> is a two-dimensional point than a list with two elements. (You can't even assume that a 2-D point will be a list of two numbers, because Paul goes on to suggest:


Or if you decide to expand in another direction and allow partially evaluated points, you can start using symbols representing variables as components of points, and once again, all the existing code will continue to work.


"All of your existing code will continue to work" only if your existing code isn't built with the tacit assumption that the coordinates of a point are numbers. If you've tried to do math on those coordinates then you're out of luck.

Which brings me to my next point...

Lisp lists are quite flexible, but they are not infinitely malleable. They are a "leaky abstraction." The fact that Lisp lists are linked lists (and not, for example, vectors, as Python lists are) is famously exposed by the fact that CDR is a primitive (and non-consing) operation. Make no mistake, linked lists are monstrously useful, but there are some things for which they are not well suited. In particular, Nth is an O(n) operation on linked lists, which means that if you want to do anything that involves random access to the elements of a list then your code will be slow. Paul recognizes this, and provides hash tables as a primitive data structure in Arc (the lack of which has been a notable shortfall of Scheme). But then he backpedals and advocates association lists as well:


This is called an association list, or alist for short. I once thought alists were just a hack, but there are many things you can do with them that you can't do with hash tables, including sort them, build them up incrementally in recursive functions, have several that share the same tail, and preserve old values.


First, hash tables can be sorted, at least in the sense that associated lists can be sorted. Just get a list of the keys and sort them. Or create a sorted-hash-table that maintains an adjunct sorted list of keys. This is not rocket science. But that is not the problem.

The problem is that the functions for accessing association lists are different from those used to access hash tables. That means that if you write code using one you cannot pass in the other, which completely undermines the whole idea of using Arc as an exploratory language. Arc forces you into a premature optimization here.

The Right Thing if you want to support exploratory programming (which to me means not force programmers to make premature commitments and optimizations) is to provide an abstract associative map whose underlying implementation can be changed. To make this work you have to commit to a protocol for associative maps that an implementation must adhere to. The trouble is that designing such a protocol is not such an easy thing to do. It involves compromises. For example, what should the implementation do if an attempt is made to dereference a non-existent key? Throw an exception? Return some unique canonical value? Return a user-supplied default? Each possibility has advantages and disadvantages. The heavy lifting in language design is making these kinds of choices despite the fact that there is no One Right Answer (or even if there is, that it may not be readily apparent).

And that is my main gripe about Arc: it has been so long in the making and set such lofty goals and then it seems to pretty much punt on all the hard problems of language design.

Now, as I said, I am mindful of the fact that this is just a first draft. But some Big Decisions do seem to have been made. In particular, it seems a safe bet that Arc will not have an OO layer, which means no generic functions, no abstract data types, and hence no way to build reliable protocols of the sort that would be needed to eliminate the kinds of forced premature optimizations that Arc currently embraces. It also seems a safe bet that it will remain a Lisp-1 without hygienic macros (because Paul seems to regard both hygiene and multiple name spaces as Horrible Hacks). Whether it gets a module system remains to be seen, but it seems doubtful. If you think designing a protocol for abstract associative maps is hard, you ain't seen nothin' yet.

So there it is. Paul, if you're reading this, I'm sorry to harsh on your baby. I really hope Arc succeeds. But I gotta call 'em as I see 'em.

13 comments:

Chouser said...

Thank you for your informative article.

I do wonder if what PG meant by "sortable" assoc lists is "ordered". You can sort a list, but if you don't it keeps its own order, which can be meaningful. You can sort hash keys, but if you don't they're not kept in meaningful order.

I'm not sure that's what he meant, since he didn't actually say it.

But all your other comment about access protocol, etc. seem pretty ... devastating.

danb said...

> If Arc doesn't save Lisp it's hard to imagine what would.


Qi
. It seems much more innovative than Arc.

root said...

> In particular, it seems a safe bet that Arc will not have an OO layer, which means...

Common Lip doesn't have an OO layer either, that didn't stop someone from making CLOS though.

In response to one of the comments:

Arc isn't meant to be innovated, it's just meant to be fun to program in. Why should it have to be innovative? All it needs to do is address problems which the author of the language perceived in other lisps. I consider Arc an excellent middle ground for anyone who enjoys lisp but doesn't like everything about either CL or Scheme.

RH Trial said...

>The Right Thing ... is to provide an abstract associative map whose underlying implementation can be changed.

You might be interested in Clojure,
a Lisp for the JVM where sequences (first/rest), maps, vectors, invoke-ability etc are all abstractions that support multiple implementations.

Ron said...

> Common Lip doesn't have an OO layer either, that didn't stop someone from making CLOS though.

Yes, and the fact that CLOS is grafted on as an afterthought is the source of many of its shortcomings. Math functions are not generic. You can't inherit from built-in types like list. (You can in Python, and this turns out to be tremendously useful.)

> Why should it have to be innovative?

One of Arc's stated goals is to be "the 100 year language." I think meeting that goal is going to take some innovation.

root said...

>> Why should it have to be innovative?

>>>> One of Arc's stated goals is to be "the 100 year language." I think meeting that goal is going to take some innovation.

Common Lisp has lasted a while and there wasn't anything very innovative about it. Zeta-Lisp and others had already implemented most of what was in the CL spec; though maybe not all in one language.

I do not believe long-lasting has a direct correlation to innovation. Google has lasted a while and it certainly wasn't the first search engine. C has lasted a while and I'd hardly have called that innovative, even for when it was released. People were already using structured programming. (At least I think so, could be wrong here.) The important thing is doing what you were designed for really really well. I believe Arc has room to grow in this area but I believe it will head in the right direction with its next release (the language is still in its infancy, mind you, this release is more of a beta than anything else. The spec, as PG said, isn't even set in stone yet; anything can change).

In the end I believe I once heard a quote saying something along the lines of "It's nice to be first; but it pays to be second."

Ron said...

> Common Lisp has lasted a while and there wasn't anything very innovative about it.

But Common Lisp is also not the 100-year language, at least not according to Paul Graham. (In fact, Paul has said that Common Lisp "sucks".) Paul's idea of the 100-year language involves more than mere longevity. If that's all there was to it then Fortran would be a serious contender too (and it *was* innovative in its day).

> C has lasted a while and I'd hardly have called that innovative

I think C was very innovative. It was the first (and arguably still the only) language to span the gap between high and low level. Before C, the only way to write an operating system was in assembler.

root said...

> But Common Lisp is also not the 100-year language, at least not according to Paul Graham. (In fact, Paul has said that Common Lisp "sucks".) Paul's idea of the 100-year language involves more than mere longevity. If that's all there was to it then Fortran would be a serious contender too (and it *was* innovative in its day).

Ah, he and I define it differently. I consider a 100-year language as a language that lasts 100 years, of course to last 100 years it has to have some sort of appeal which will make it last that long. As such it has to be a particularly good language (java for example, isn't a 100 year language) or else another language may wind up taking its place. Arc is not a 100-year language at this point; but it has potential.

Incidentally, under my definition, I believe CL to be a 100-year language (Fortran not-so-much, though it will always have it's niche. I believe it was replaced by C).

>I think C was very innovative. It was the first (and arguably still the only) language to span the gap between high and low level. Before C, the only way to write an operating system was in assembler.

I see I was in err, I was under the impression that B, Forth, Fortran, or Algol could do that; I'm afraid I'm not too familiar with any of those languages though, leading to my misconception.

oz said...

I submit that what you want from a programming language is not one that makes programs shorter, but one that makes programs easier to create.

i think the word expressiveness is often used. some people in the programming language community [who had fed up with all the informal claims in this area] had tried to identify and quantify this in the distant past. my favorite example is felleisen's 1990 on the expressive power of programming languages. there are some interesting conjectures on this paper that may be worth revisiting in light of PG's [ongoing] commentary...

Chris Barts said...

> I see I was in err, I was under the impression that B, Forth, Fortran, or Algol could do that; I'm afraid I'm not too familiar with any of those languages though, leading to my misconception.

In fact, Burroughs used (a dialect of) Algol to implement the OS of the B5000 in 1961, whereas C was invented just about a decade later. C really wasn't all that innovative. It won for two real reasons:

1. It came with Unix. The best way to spread a language is to make it the default extension language of a successful product.

2. It fit the kind of hardware the world was moving towards. C works best on a byte-addressable system where a pointer to a byte fits in a single machine word and there are no mechanisms in hardware to enforce type-safety. It also helps if a character is the same thing as a byte. That more-or-less describes the hardware the world is using now that the 36-bit machines have died and x86 has shed its enforced segmentation. Even word-addressable RISC systems are close enough for C to be a good option for them. (It helps that after a certain point, C was common enough hardware designers explicitly designed to make their architectures C-friendly.) The most common C-unfriendly architecture now is probably the JVM.

Unknown said...

I strive to become a Lisp aficionado and so am disheartened that someone could lose their faith in Lisp. Now that you have done considerable work in Python, do you still feel that way?

Jim said...

What if pg and rtm are performing a giant "head fake" on the lisp community by releasing a version of Arc based on Scheme that they promise to break in fundamental ways?

If true, then the lack of hygenic macros, Unicode, etc are just ways of coaxing people into the upgrade. I wouldn't put it past pg and rtm to build some "can't live without it" web thing in Arc, then charge for the upgrade, either.

I'm nearly certain that Arc will eventually be a proper Lisp.

Ron said...

First, Paul has specifically said that unhygienic macros were a deliberate decision because he thinks name capture is a purely academic concern that doesn't actually arise in practice. And the whole unicode thing has been fixed (and even before that I agreed with Paul that it was all much ado over nothing).

Second, I'd say Arc is already a "proper Lisp" (to the extent that I can glean the meaning of that phrase from your context). What it isn't IMHO is progress towards what I would consider a 100-year language. My fundamental objection to Arc is not really Arc per se, but the ostensible quality metric according to which we are supposed to be judging it, namely, brevity. I say "ostensible" because all Arc programs can be made shorter in ways that clearly don't make them better (like eliminating argument lists and using implicitly named variables everywhere). So Paul's real quality metric is some tradeoff between brevity and some other unspecified qualities. Such a disconnect between rhetoric and reality is rarely a catalyst for progress.