Archive | Programming RSS feed for this section

!lilli!!lllliil!illilllillll!!i!lllll!!iillll…yes, syntax matters

14 Feb

Re Lisp without parens again, I dug up this old quote I clipped from a Reddit comment that sums up why Lisp’s parentheses are a real problem:

There is a reason all Lisp languages didn’t break through: the syntax is too monotonous. A program is written for humans to read, and humans are not too good with repetitive anything, including repetitive parentheses.

Another commenter writes:

People can go and love Lisp and [its] derivatives all they like, nobody I know finds deeply nested s-expressions very readable or writable. Syntax doesn’t matter much, but it does matter. I hacked on a fair amount of emacs customizations, but I always found it hard to follow the control flow.

People just don’t like parsing text like the title here for the same reason that they much prefer reading decimal or hex over binary. Sure, with time and experience, we can learn to deal with monotonous syntax, but given the choice, why should we put up with it? Some people get really good at sending and receiving Morse code, but Morse code is not only far less accessible than a keyboard, it’s unarguably less efficient: a proficient typist will always beat a proficient telegraph operator.

The parens of Lisp are repetitive noise which one can learn to cope with sufficiently well for some tastes, but not well enough for most programmers. Animvs solves this problem and goes a step further: by imposing a stricter indentation scheme and introducing first-class symbol highlighting, Animvs gives code a significant shape indicating structure that can be reliably parsed at a glance with low mental overhead.

If it were a good idea, it would exist already

8 Feb

Re my previous post, I should acknowledge that ‘Lisp without parens’ is a very old idea. Old enough that any time it comes around again, long-time Lispers leap out from their parenthesis-girded fortresses to ridicule the idea. This raises a good question: if Lisp without parens is a good idea, why hasn’t it become a reality? I have three explanations:

  1. The Lisp-without-paren solutions of the past made the fatal mistake of trying to infuse Lisp with infix notation. See, for example, Dylan. This is just a bad idea, as it solves the too-many-parens problem but complicates (at best) Lisp’s homoiconicity, making macros much harder to write and thereby defeating Lisp’s one remaining unique feature.
  2. Indentation-sensitive syntax was an old idea before Python, but before Python took off, everyone ‘knew’ it was a bad idea. (And in fact, some still insist that indentation-sensitive syntax doesn’t work.) And it wasn’t until Python was well established that a few people began to suggest using indentation to leave Lisp parens inferred but keeping the S-expression structure intact. So the idea of Lisp-without-parens is maybe 50 years old, but the idea of Lisp-without-parens-but-keeping-S-expressions is less than a decade old. As the Python example illustrates, sometimes good ideas just take time and a few failed starts to become reality.
  3. A more general problem is that having one good idea often isn’t enough: existing technologies and their accompanying ecosystems have a lot of inertia, and the current set of users will resist the pain changes bring as long as the benefits are unclear or seemingly minor. The applicable lesson from this is that the first successful Lisp that gets rid of parenthesis will most likely include other compelling features.

I’ll submit that Animvs avoids these problems. It cleans up the parens and indentation style, but keeps the syntax homoiconic and reductively simple (simpler, in fact, than any existing Lisp, what with their hacky reader macros polluting the nice clean symbols). It also introduces new ideas other than just a new syntax.

Stop creating new languages

30 Aug

Every couple of months, an announcement for a new language pops up on ProgReddit or Hacker News. While some of these languages might have interesting ideas, their ideas rarely justify whole new languages, so mostly these languages seem like arbitrary remixes of existing ones. Consequently, these languages’ authors often come off a bit like crackpots: ‘Look, everyone! I’ve rearranged the bookshelves with my new classification system. Once you master it, you’ll find browsing of biographies 6% more efficient and reshelfing of autobiographies 11% more efficient! *ehem* Once you master it.’

Some observers react to this steady nuisance of quixotic pet projects by dismissing the need for better languages entirely. This is sensible in the short term because new things in programming rarely constitute big enough improvements over the day’s status quo to justify the transition costs. In the long run, however, it’s myopic: the languages and tools of today are generally significantly superior to what we were using a generation ago, so it’s not unreasonable to expect further significant advances.

In one reading of the history, though, the improvements we’ve seen in the last twenty-odd years are entirely from the realization of old ideas—automatic garbage collection, full object-orientation, functional programming, etc.–and so it’s claimed that no one has had any really new ideas for decades now. There’s something to this observation, but we still shouldn’t reject new languages out of hand:

  • First, the original formulations of the old ur-ideas prompted many practical questions, but many of our answers to these questions still remain sketchy, leaving open the possibility of more fundamental changes to come.
  • Second, while I think it very unlikely that, at this late date, someone will identify a new programming paradigm, it always seems naïve to declare the End of History and rule out any future potential for big, transformative ideas.
  • Third, and most importantly, I don’t believe languages must only advance on big ideas, for little details matter—they add up. Even if what most new languages largely do is just rearrange the furniture for the sake of aesthetics and minor efficiencies, after a few rounds of 5% improvement, you begin to see a real qualitative difference. Python, for instance, is semantically not all that different from Perl, but what a difference sane syntax makes.

So am I saying we should tolerate the crackpots? To a point. Any new language warrants major skepticism, no matter the source, but especially a language coming from an unknown. It’s for a good reason that we have a natural tendency to treat the opinions and ideas of established voices much more charitably—both in time and sympathy—than those of unknown quantities: without this bias, we’d waste a lot more time on crap than we do already, for it simply can take a lot of time and effort to discern the difference between a crackpot and someone worth listening to.

So as an unknown with something Important To Say, you must be very careful in how you present yourself and your ideas so as not to be dismissed as a crackpot. I have two pieces of advice. First off:

Don’t be a crackpot.

Obvious, perhaps, but surprising how many people miss this one. Second:

Be as clear as possible.

Only when reading a name-brand am I willing to accept that difficulties in comprehension are my own fault, not necessarily the author’s. Not so for an unknown. If James Joyce hadn’t written Dubliners, it’s doubtful anyone would ever have read Finnegan’s Wake, let alone called it brilliant.

In the particular case of introducing a new programming language, it’s especially critical to be very clear about the problems your language addresses. What’s the point of this thing? How is it supposedly actually better? Before I continue reading what you have to say, I want to know that you’re not just re-arranging the furniture.1

So it’s with full awareness and trepidation that I admit that I, too, have tried my hand at designing a programming language. Following my own advice, I’ll try to be up front about what I’m pushing: a Lisp people will actually learn and use.2

Here’s what I want in a Lisp, in order of ambition:

  • Easy to learn: The standard dialects of Lisp tend to be taught ineffectually and tend to be unnecessarily confusing. (Yes, that includes Scheme.) I won’t go in to details here, but suffice it to say that ease of learning really matters—not just for the sake of getting more people to use the language, but for the sake of getting those who use the language to truly understand it.
  • Readable syntax: As everyone knows, Lisp has a problem with parentheses. Proponents argue that you just get used to it, and this is true, but the preponderance of parentheses constitutes a lot of line noise that I believe hinders readability (and editability) even for experienced eyes. Additionally, some Lisp dialects get a bit too noisy with reader macros, such as having the apostrophe for quoting all over the place. Furthermore, I find that the irregularity of the standard indentation style of current Lisps is unnecessarily difficult for learners to grok and leaves too much to stylistic choice.
  • Syntax highlighting, code assist, and assisted refactoring: Programmers working in Java and C# have become much accustomed to conveniences that keep their code neat, that provide quick access to documentation, and that free them from having to remember minute details such as type taxonomies, function signatures, and precise identifier names. Providing those same conveniences in a dynamic language is much more challenging and error prone because something as simple as renaming an identifier often requires that the tools make risky assumptions about what’s going on at runtime. Up to now, solutions to this problem have relied upon very sophisticated code analysis that still doesn’t work right much of the time. I believe there’s a simpler solution.
  • Push-button debugging: Programmers working in Java and C# become accustomed to no-hassle debugging, where setting a breakpoint requires just a click and where the IDE takes you through the code as you step through. This level of ease is lacking in most other languages, but especially in Lisp, where macros complicate the process.
  • Embedded data: Lisp’s tree-based syntax makes it usable as a structured-data format, meaning we don’t have to punt data into a separate format, such as XML or JSON. Instead, data can be expressed in Lisp using an ordinary library rather than a special syntax that requires special processing and tools. This could spare us from perverse data languages, like XSLT, which inevitably contort into full-fledged—and crappy—programming languages. The trouble is that standard Lisp syntax doesn’t work well for data dominated by text, i.e. documents. So, for instance, while you might use a current Lisp in place of JSON, you probably wouldn’t use one in place of HTML.
  • Embedded languages: While some languages arguably shouldn’t exist at all (some haters say this about Java, for instance), other languages, like C and C++, clearly exist for a reason. But the fact that these languages fill necessary semantic niches doesn’t mean that they need their own syntaxes: instead, the right dialect of Lisp could “host” the complete semantics of a foreign language as a library. Consider a C program, which is typically written as a mish-mash of C code, preprocessor directives, and build files (makefiles, etc.). We could create a Lisp library that allows us to write C semantics in Lisp and produces the same end product (executables and object code) but which would elegantly integrate the equivalent functionality of the preprocessor and build chain in a way that is cleaner, more flexible, and easier to learn. If a way can be found for Lisp syntax and macros to provide the ideal amount of syntactical concision for all possible languages, future language designers can forget about syntax and just focus on semantic innovations.3

Now, as it turns out, the Lisp I want in all other respects resembles Clojure, so really what I’m proposing is specifically a Clojure dialect. In fact, implementation of my dialect won’t require much more than swapping out Clojure’s reader, wrapping some of its macros and functions, adding one or two data types, and creating editor assistance.

I’m calling my Clojure dialect Animus. Animus is still very much in flux, but I describe it in its current form here. Also take a look at some experiments with various languages to see what they might look like embedded in Animus.

  1. Or at least, if you are just rearranging furniture, I’d much rather you be honest about it: if you yourself realize that that’s what you’re doing, then you at least have a chance of delivering an actual—if small—improvement to the status quo. []
  2. This isn’t actually what I set out to design. When I first started thinking about a language a few years ago, my favorite language was Python, and I didn’t know Lisp, so for a long time I was simply thinking of ways to improve upon Python. At some point, I accepted the idea of prefix notation and macros, and things progressed from there. []
  3. Haskell strikes me as language that could greatly benefit from embedding in Lisp. The few times I’ve attempted to pick up Haskell, I’ve been offended by the ridiculous Perl-like syntax of ad hoc convenience piled upon ad hoc convenience. If there’s something worthy in Haskell’s semantic model, it’s obscured under a mess of syntax. []

A beginner’s first programming language

27 May

I’ve finally put together and posted the video of part 1 of my introduction to programming. This first part presents a simple programming language to users in about 70 minutes.

UPDATE: I’ve also now added the second part, which covers representing numbers as bits.

The remaining eight or so parts will have to wait until I devise a better process for turning my slides and narration into video.

Clojure introduction video; Pigeon on hiatus

19 Apr

I’ve created an hour long introduction video to Clojure, a new dialect of Lisp.

I’ve also decided to leave unfinished my implementation of Pigeon, my pedagogic programming language, even though it would take a trivial amount of work to complete. I now think that it’s not really important for students to actually run Pigeon code; in fact, I think it best to discourage students from writing anything beyond a trivial amount of code in Pigeon lest they waste time rather than just moving on to later material.

Instead, my focus now is on creating a course of videos for total newbs to programming, totaling about 10 hours and running in this sequence:

  • A first language (a run through of Pigeon)
  • Numbers (how numbers are represented as bits)
  • Text and images (how they are represented as bits)
  • The system (hardware and OS basics)
  • Language and tools survey
  • Javascript (sans browser)
  • The internet and the web
  • HTML / CSS / Javascript (in browser)
  • The command line and Unix environment
  • C
  • Data structures and algorithms
  • OOP
  • Java
  • Encryption, security, and compression
  • GUI Toolkits (Swing?)
  • Version control
  • Databases

I’m processing the audio and video for the first two, but that leaves a lot of work as I only have sketches of the remaining parts.

The naturalistic (language) fallacy revisited

5 Jan

The comments to this y.combinator item add to what I said here. In particular, commenter apinstein writes:

I used to do a lot of AppleScript programming. When I initially learned about the “natural” syntax I though “this is gonna be so easy!” But ultimately it works against you.

Computers are very precise beasts, and they need to know exactly what you want them to do. The looser the “syntax” gets, the more guesses the compiler has to make to come up with a set of precise instructions.

What I initially thought would be easy and liberating turned out to be a total PITA. AppleScript programming is horrible. Ultimately there is an underlying syntax, but it’s harder to remember because it’s less consistent (ie “natural”). I had to spend way too much time trying to understand what goofy “natural” grammar I had to use to get it to do what I wanted.

Even if you assume an “ideal AI,” I still don’t think that a natural language syntax is a good idea, since language itself has a lack of specificity that requires even an “ideal AI” to make guesses that could be logically wrong.

Teach the (other) controversy

31 May

My programming education began when I took a C language course at the local community college. I can still recall how strange I found the language’s rules about when I could and couldn’t use a variable (e.g. variables declared in one function can’t be read or modified in others), for it seemed to me this made writing programs far harder than they would be otherwise. Combine this confusion with the syntactical kruft of C and the fact that I took my instructor’s prohibition against global variables to mean never use globals (something I later learned real world C programs of non-trivial size don’t actually do), and the result was that I ended up totally paralyzed, baffled as to how programmers ever got anything to work. For these and a few other reasons, I basically abandoned programming entirely before taking it up again two years later, this time studying independently from books.

Somehow, some very basic ideas in programming just didn’t click upon my first learning attempt even though I now find these ideas very simple and clear. While my C instructor was mostly competent, he failed to focus on the vital ‘why’. Why does the language make me do this? Why is the syntax like this? Etc. Unfortunately, it’s too easy for learners to give up on ‘why’ because so few sources out there—teachers, books, blog posts—provide clear, accurate, complete answers to the ‘why’ questions (and far too many sources aren’t too hot on the ‘what’s or ‘how’s, either). Why do modern computers use 8-bit bytes? Why do we need to allocate memory? Why are exceptions expensive performance-wise? Many decent, working programmers out there simply have no idea how to answer questions like these. The really bad ones wouldn’t understand the answers or care if you tried educating them.

Recently, I’ve realized that the biggest, most common failing of programming education is the tendency to teach a technical matter as a solution in search of a problem—as a mechanism without a ‘why’. A great example is generics in Java, which are so convoluted that their explanation takes up a good quarter of a full treatment of the Java language. Absorbing all the subtle rules and asymmetries of Java generics typically distracts students from a critical understanding of why generics exist in the language in the first place. I would go so far to say that the fact that Java got on fine for many years without generics is the first and most important thing a student should know about generics. Only after well establishing what perceived problems generics were meant to address should students learn what generics are and how they work, and then it’s critical that this be followed up by exposure to dissenting arguments against generics.

You might assume confronting learners with controversy up front will lead to confusion, but on the contrary it gives a clearer presentation because it is more honest. Lies, hype, and wishful thinking tend to be incoherent and therefore perhaps impossible to be understood by anyone not already versed in the truth. Furthermore, teaching controversy has the benefit of putting students in a critical mindset, e.g. if the dominant languages of the day may harbor serious mistakes about which it’s OK to have your own opinion, maybe the whole basis of programming is neither set in stone nor out of reach for you to one day fully understand computing and to one day have a hand in directing the course of computing’s future development.

Piled Heap of Poo

24 May

Anti-PHP screed #34019. For those of us who’ve only glanced at PHP, both interesting and distrubing.

My favorite bit, though, is an in passing quote from a C course the author took: “German Umlaute don’t work in C, so don’t use them”.

Video of talk on Pigeon

18 May

Last month at LugRadio Live USA 2008 in San Francisco, I gave a talk discussing programming education and Pigeon, my learner’s programming language. Videos of all the talks at LugRadio Live are going up. Below is my talk, which you can also download. (I occasionally mumble a few key words. Sorry.)

The current status of Pigeon is that I still haven’t bothered to put the finishing touches on it for it to be actually usable, as I’m currently working on the material for students to learn Python after learning Pigeon. Until I give learners some plausible place to go after Pigeon, I figure I can put off finishing up Pigeon itself.

UML sucks

16 May

Everything said here.

I’ll just add that pictorial representation of code is fundamentally flawed because it inevitably means drawing a bunch of boxes and connective lines all over the place. Just as there’s no one true way to distribute your functions and classes in text, there is no true optimum 2D layout for boxes representing functions and classes. No algorithm does this layout well enough to not need human reworking. Every good diagram of any complexity you’ve ever seen, whether of code or of a subway system, has been heavily hand-massaged if not entirely hand-generated. Even if you had a decent algorithm for layout, the layout it computes might change radically each time you add a box or connection, which would be disorienting in the same way as if someone were to rearrange your workspace behind your back. A big part of what it means to be comfortable with a codebase is knowing where most things are. How familiar can you ever be if the codebase gets constantly scrambled? The only solution is to have the coder position all the boxes and draw the connections manually, but this will just mean that systems will get uglier and uglier as they grow and that refactoring the layout will grow more and more painful.

Text to some degree has these problems too, for you have to decide which file to put a function or class in, and you must then decide how to order the functions and classes of each file. But in text, you aren’t burdened with the stylistic choice of how to draw connections: everything has a proper name, and you make connections by using the proper name, end of story. Making connections pictorially is only simple if you stick to straight lines, which typically makes the diagram much harder to read than were the connections cleverly grouped and routed along the cardinal directions. Text code with many connections is complicated but readable; diagrammed code with many connections is complicated but totally unreadable. The whole point of diagramming code is to visualize connections, to see the shape of a design, but past a rather low level of complexity, the visualization is too much to mentally process.

In sum, stick to code for code. Use diagrams only for broad communication, and use them only when they are understood to be lies.