Archive | May, 2007

What does learning to program involve?

28 May

If you’re not already an initiate to programming, you may not be clear what you’re getting into when you set out to learn to program, so here’s an overview.

Most obviously, you must learn a programming language, which comes down to learning four things:

  1. a syntax: The syntax of a language is the set of rules for how code of that language is written.

  2. a standard library: Programming in all languages is in some way modular, i.e. a piece of code can use other pieces of code. Rather than write all of a program’s code from scratch, programmers use pre-existing code from code libraries. Virtually all modern languages each have their own standard library, a library which is included as a default part of the language. There typically exist many libraries for a language, but only the language’s standard library is universally known and used by all programmers of that language.

  3. code idioms: Given a syntax and a standard library, there then exists in a language some informal set of idioms (reoccurring patterns) of how to code in the language which programmers write over and over again. For various reasons, these idioms can’t be expressed as library code, but while they are not built into the language or officially specified in the language specification, every good programmer uses established idioms wherever possible. An idiom finds favor because it has been proven over time to be the best and/or most obvious way of doing something. (I use the term ‘idiom’ to mean a ‘small scale pattern’, something which is expressed in a single place in code. In contrast, the term ‘pattern’ is used to refer to ‘design patterns‘, a specific set of large-scale design strategies used in programming that are generally not particular to any language. Design patterns are also a very good thing to learn, but they aren’t a high priority early in your education.) You won’t hear much talk about idioms among programmers because, in all, there just aren’t that many of them in most languages, and they’re fairly natural to pick up.

  4. a set of tools: To write their programs, programmers use tools (programs):

    1. programmers do most of their work typing code in text-editor programs
    2. code typed by the programmer needs to be translated into a form runnable by the computer; this translation is performed by compiler, interpreter, assembler, and linker programs (which of these are used depends upon the programming language)
    3. debugger programs allow programmers to see exactly what happens when their programs run so they can ferret out bugs
    4. profiler programs help programmers test the performance of their programs and see how long it takes for portions of code to execute so that they can see what portions of code perhaps need to be optimized
    5. in some areas of programming, code generators can sometimes help speed up development by doing some of the coding work for you
    6. version control systems help programmers keep track of the changes they make to their code as they develop it, making it easier to coordinate with other programmers and revert back to earlier versions of their code
    7. programs which integrate some or all of the functionality listed above are called IDE‘s (Integrated Developer Environments)

(By the way, don’t worry about tool costs: very high-quality tools are freely available for download for nearly all languages used today; in fact, with the notable exception of Microsoft languages—C# and Visual Basic—the preferred tools are the free and open source ones, while commercial tools are increasingly disfavored and are disappearing.)

So the question you probably have now is, ‘Which language should I learn?’ The correct answer is that you should learn more than one:

  • C: I start with C mainly because: 1) it is the lingua franca of programming; 2) learners can understand what’s really going on in the computer when they run their program written in C; and 3) most features of its syntax show up nearly verbatim in other major languages, such as Java. For purely pedantic reasons, C is a language you really must know, even if you never do any serious programming in it—you’re a better programmer simply knowing what programming in C is like. On the other hand, I strongly discourage you from attempting to write anything but trivial programs in C early in your learning phase, for it takes a good amount of programming experience and knowledge to write even mildly-sophisticated end-user programs (such as programs with graphical user interfaces) in C. Once you attain a decent amount of comfort with basic C, you’re much better off focusing on some other language, such as Java or Python.
  • Java is steadily replacing C as the lingua franca of programming. It is an object-oriented language with a very large standard library, so writing programs is significantly easier to do in Java than in C. Java is the most common choice of a first language to teach students today because Java protects programmers from misusing memory. I think this choice is a mistake, however, because it’s important as a programmer to understand what exactly Java is doing for you, and you can’t really understand that unless you study C. Worse, Java has a rather complex set of advanced syntactical features, and the set of tools and libraries typically used by experienced Java programmers is large and complex. These facets make it hard for learners to get up to speed with real-world code examples. Despite these flaws, Java, unlike C, is a good language to attempt writing your first real programs in.
  • Python is a language with an exceptionally elegant syntax which avoids clutter and nasty ‘corner cases’ (meaning situations where the language has to have complex rules to cover all possible cases). Expressing your ideas in Python is arguably easier than in any other language out there, so writing useful programs in Python is as easy as it gets. On the downside, Python code runs very slow compared to C and Java code, so if you want to be a ‘real programmer’ that can write processing-intensive programs, knowing Python is maybe not enough.

I consider these three languages the most essential to learn, not just because of their populatiry but because of how well they suit their roles. The best order to tackle them in is either C then Java then Python or the other way around: if you’re impatient and want to write something neat fairly early in your education, start with Python; if you’re patient and insist on understand how things really work, start with C.

Here are some other languages to be aware of:

  • C# (pronounced ‘C sharp’) is essentially Microsoft’s imitation of Java. Despite what the name implies, it is only tangentially related to C. While an improvement over Java in some ways, it’s really not different enough from Java to really make that much of a difference which of them you learn. (Many programmers will insist you stay away from Microsoft’s proprietary technologies, though there actually is an open source implementation of C# available called Mono.) I recommend Java because it’s the older, more established language with the broader reach and because better free tools are available for it. However, once you get comfortable with Java, taking a look at C# wouldn’t hurt because a lot of software companies out there are Microsoft shops, and most of them will be using C# by now.
  • C++ is essentially C with object-oriented programming features grafted on. The result is messy but not without its virtues. Because of its performance advantages, C++ is used for end-user applications with high performance requirements, notably games. However, if you’re learning programming to make games, you’re kidding yourself if you start with C++: it’ll be years before you’ll develop the skills to make games complex enough to stress a modern system, so you should focus your learning on an easier to use language. As you develop better program-design skills, they will translate to C++ when the time comes; in the mean time, you should broaden your general programming knowledge.
  • Visual Basic: The basic problem with Visual Basic is that it’s really not that basic at all any more. In its origins, VB was a language that was going to bring programming to the masses. Since then, the features of ‘real’ programming languages, like C++ and Java, have been grafted on to VB in ugly ways, resulting in a language which is just about as complex but not as elegant and therefore not used much except for writing ‘in-house’ software (software which is written for the internal use of a company/organization). The joke goes that a serious programmer doesn’t dare publish programs written in VB to the wider public lest he gets laughed at for using VB. Despite its unfair reputation as a toy language, VB can be used to do real work—there’s just little reason to choose it over Java and C# on the merits. (This judgment is possibly just aesthetic bias: I and most other programmers much prefer the look of Java and C# code to that of VB.)
  • Ruby has been stealing glory from Python lately, largely because of the popularity of ‘Ruby on Rails’, a framework for developing web programming (a framework is a glorified, overgrown library). In the end, the two languages are extremely similar such that learning one will make learning the other quite simple. Ruby arguably has a few feature advantages, but Python takes the edge in elegance and carefully thought out syntax. If you want to do web programming, I recommend the Django framework for Python as a replacement for Rails.
  • Perl is like Python’s deformed cousin. It’s a convoluted mess dearly loved by its partisans but inhospitable to first-time programmers. Currently, Perl is in limbo as the long awaited version 6 has been years late in coming; meanwhile, Python and Ruby have been stealing Perl’s niche, and I predict Perl will never recover, eventually fading into perenial also-ran status.
  • PHP, due to the popularity of the web, is the first language of a surprising number of people these days. This is unfortunate because PHP has little reason for existing except to plug a hole that shouldn’t have existed in the first place. In almost all respets, it’s otherwise an undistinguished amalgam of standard language features taken from other languages and inelegantly smashed together. Still, PHP is a popular choice for writing open source web applications, such as the blogging software that runs this site (WordPress) and the MediaWiki software that runs Wikipedia. I find this unfortunate, but PHP had the right features at the right time, and there’s no arguing with good timing.
  • Javascript, despite the name, is a totally distinct language from Java. A Javascript interpreter is embedded in today’s web browsers, so a webpage can include code that makes the page dynamic, i.e. Google Maps is a webpage with Javascript code that updates the page’s content as the user clicks buttons and drags the map. Javascript is a better language than many give it credit for, but it is only really used in the context of the browser, and quite honestly, Javascript is only used at all because it’s the only language the browser runs—you don’t have a choice of other languages if you want to add programmatic features to a webpage. Javascript would actually make a great first language except that it’s tied to the browser, so as a practical matter, learning Javascript requires learning all sorts of messy, browser-specific stuff that distracts from the essence of programming.

Of course, learning to program involves quite a bit more than just learning a language. Additionally, there’s:

  • terminology / way of speaking

  • best coding practices (especially how to write readable code)

  • design (even if you know a language inside out and write meticulously crafted code, you may still be clueless about writing programs in the large; without learning good design practices, your programs will become unreadable and unmodifiable rats nests as they grow in size)

  • a bit of computer science (especially fundamental data structures and algorithms)

  • optimization strategies

  • hardware concepts

  • operating system concepts

A note about this last bullet: whatever the language, the real difficulty of learning standard libraries (aside from the fact that many of them are very large) lies in the fact that their most important parts are heavily intertwined with concepts not particular to the language, e.g. every standard library includes code for reading and writing files, but files are really an operating system concept, not a language concept. Unfortunately, standard library documentation has the tendency to assume the reader is already familiar with these underlying concepts because it assumes an audience of programmers experienced in other languages; when the documentation does make an attempt at elucidating these concepts, it tends to do a half-assed job; even when the documentation explains these underlying concepts well, this is still a suboptimal way of learning because it confuses the concepts with the particulars of that particular library and language. So you should try to learn OS concepts in their own right, independently of any language; there are a few good books about operating system design, in particular, Modern Operating Systems (2nd edition), by Andrew S. Tanenbaum and (for when you’re comfortable with C) Linux Kernel Development, by Robert Love.

This same advice—’learn concepts, not libraries’—applies to a few other domains you’ll encounter, including networking, databases, and web standards, e.g. before devoting a lot of time to learning how to work with a database using Java’s standard libraries, learn about databases independently of any language.

I should end by mentioning ‘Pidgin’, my in-the-works language intended strictly for educational purposes. (Yes, the Gaim people are using the name ‘Pidgin’, but I came up with the name earlier, besides which, it’s a name most appropriate for a language.) My LearnProgramming.tv project has been on hiatus, partly because I’ve been reconsidering its approach ever since I had the epiphany that what’s really needed is a better language for beginners. There’s nothing to show yet, but implementation shouldn’t take long, as the syntax is extremely simple (it uses prefix notation for expressions) and I’m simply translating into Python. If you’re looking to learn to program, please leave a comment—I’m looking for some guinea pigs.

The United States is doomed

27 May

Glenn Greenwald

This unbelievably irrational, even stupid, concept has arisen and has now taken root — that to cut off funds for the war means that, one day, our troops are going to be in the middle of a vicious fire-fight and suddenly they will run out of bullets — or run out of gas or armor — because Nancy Pelosi refused to pay for the things they need to protect themselves, and so they are going to find themselves in the middle of the Iraq war with no supplies and no money to pay for what they need. That is just one of those grossly distorting, idiotic myths the media allows to become immovably lodged in our political discourse and which infects our political analysis and prevents any sort of rational examination of our options.

The only way Congress cutting off funds for the war would have a negative effect on troop security is if the President keeps the military in theatre past the time when the previously alloted funds run out, but so lodged is the conventional wisdom, that were Congress to cut off funds and then the President to keep the military in theatre anyway, Congress would get the blame for any consequences.

…[T]he notion that de-funding constitutes a failure to support the troops — in a way that, say, timetables do not — is just inane, not even in the realm of basic rationality or coherence.

And yet exactly this nonsensical notion was permitted not only to take hold, but to become unchallengeable conventional wisdom in our public debate over the war. The whole debate we just had was centrally premised on an idea that is not merely unpersuasive, but factually false, just ridiculous on its face. That a blatant myth could be outcome-determinative in such an important debate is a depressingly commonplace indictment of our dysfunctional media and political institutions.

And how about an indictment of the American public? Democrats are scared to challenge conventional wisdom because that makes them eggheads who think they’re smarter than Joe Six Pack. Were a Democratic politician to undergo a true Bulworth transformation, they wouldn’t just push against the boundaries set by the media, they would have the courage to call voters on their stupidity: ‘Fuck you, voters, for believing that. Fuck you for not paying attention.’ That’s the crass version, of course, but I’m serious: the only way to solve our problems is to elevate the debate, which means shaming the media and the public for keeping the debate mired in trivialities and for not entertaining arguments more than two clauses long.

The naturalistic (language) fallacy

26 May

Why natural language in programming languages is a fool’s game.

  1. The main purpose of a formal language is that it is free of the ambiguities found in natural human languages.
  2. While naturalistic language may give off signals of familiarity and thereby boost a programming language’s approachability, the gain is more than offset by the increased complexity which naturalism introduces, complexity which learners will very quickly be confronted with. Naturalism acts as a mask for language complexity, not a substitute for it.
  3. I for one am skeptical of the ‘cult of smartness’ found in programming culture, but even I will say that a learner new to programming just isn’t going to cut it if they can’t adapt to the concept of a formal language: if you’re hanging on to those bits of typical programming languages that happen to be English-like (e.g. control words like ‘if-else’), then you just aren’t getting it.
  4. The usual bits of English-like syntax found in our languages are often more misleading than helpful.

This last point can be illustrated with the keyword ‘while’. The keyword ‘while’ obviously was introduced with a sentence in mind like, ‘Do this stuff while this condition remains true.’ Problem is, this sentence, as understood by the typical English speaker who is not already versed in a C-like language, would not convey what ‘while’ actually does; the sentence only sounds good enough as a description of ‘while’ to programmers already familiar with the ‘while’ construct.

A more accurate sentence for describing ‘while’ would be, ‘If this condition is true, do this stuff, otherwise move on; after every time you do the stuff, test the condition again, doing the stuff if true or moving on, ad infinitum.’ Notice the absence of the word ‘while’. Also notice that the sentence which the originators of ‘while’ did have in mind is not what a newbie to the language is going to see—they will just see the word ‘while’.

Not only is ‘while’ not really evocative of its function, it easily misleads: a very common misconception learners get is that a ‘while’ loop will end the moment the condition is made untrue inside the loop rather than when the block completes and the condition tested again; many learners likely get this misconception because it is consistent with the sentence, ‘Do this stuff while this condition remains true.’ Learners would be better served were ‘while’ instead arbitrarily named ‘friedchicken’ (perhaps ‘kfc’ to save on typing).

Now, of course, not all English-like elements are so poorly chosen—’if’ is pretty sensibly named, though I would have gone with ‘otherwise’ in place of ‘else’—but the acceptable cases are few enough that one can hardly say there is much benefit in attempting to conform to an ideal of naturalism. You’ll likely just do more harm than good.

Just what Aunt Tillie needs: Vi?!?

21 May

This last week I’ve been getting familiar with Vim. I’ve dabbled a few times in the past, but this time I’m finally feeling comfortable enough to stick with it. I was quite annoyed with having to hit ESC all the time, but a neat tip is to set this in your config file:

imap jkl <esc>
imap jlk <esc>
imap kjl <esc>
imap klj <esc>
imap ljk <esc>
imap lkj <esc>
set timeoutlen=500   " cuts down the pause time you'll see after typing j, k, and l

…so now you can get out of insert mode by smashing ‘j’, ‘k’, and ‘l’ simultaneously. This will, of course, conflict if you ever need to type one of these sequences, but those are surely quite rare and work-around-able. I’m even going to experiment with reducing this just to ‘j’ and ‘k’, which is itself a really rare sequence in English. (Maybe ‘s’, ‘d’, and ‘f’ or just ‘d’ and ‘f’ can be set to ESC as well.)

If you don’t like that solution, other good choices are:

imap <S-space> <esc> " shift-space to get out of insert mode

…and:

imap <S-enter> <esc>  " shift-enter to get out of insert mode

I mention Vim because I recently played with the alpha of Archy, which is a project to implement the ideas of the late interface theorist Jeff Raskin, and its text editing mode feels much like Vi stripped down to the essentials for the common user. Here’s a quick summary of Archy’s interface:

  • The primary part of the interface is one big text-editing window; there’s only one sequence of text, but the text is separated into ‘documents’ by special divider lines (which are inserted by the ` key).
  • Navigation up and down the big glob of text is done primarily by holding down an alt key and typing a string of text to “leap” to. Using left-alt does a back search for that string while right-alt does a forward search. Releasing alt takes you back to typing mode at the location you just leapt to. To repeat your last search, hold capslock and tap alt (right-alt for forward searches, left-alt for back searches). Having to hold down alt while typing is pretty awkward and uncomfortable after a while.
  • This ‘leaping’ feature with the alt-key searching is also how you select text: the selected text portion is always the section between the place you last leapt from and leapt to.
  • The user issues commands by holding down the capslock key, then typing the command. E.g. [capslock]-C-O-P-Y to copy the highlighted text. Even with auto-completion, having to hold down capslock while typing is pretty awkward, especially if the command contains an ‘a’, ‘q’, or ‘z’.
  • Text can be formated (coloring, fonts, style, size, alignment, etc.) by highlighting text and issuing a command.
  • And for some reason, to get a key to autorepeat, you must triple-tap it before holding it down. (This is really annoying, especially with the navigation and backspace keys.)

Text-editing is the only part of the alpha, but Archy’s other key component will be what they call the Zooming User Interface (ZUI), which I take it is where non-textual elements will reside. How these components will fit together, I’m really not sure.

What to make of this? First, I’ll say there are some things I admire about this approach—mainly, Archy bravely recognizes average users can cope with a few non-discoverable elements: as long as the set of basic concepts to learn is small and worth the pay off, it’s OK—contra Steve Jobs—to expect users to learn something. Archy strikes me as an attempt to bring the Unix philosophy to the masses: rather than relying upon packaged solutions to problems, users are encouraged to build upon a base of simple yet powerful mechanisms; Archy simply starts from a clean slate, dispensing with the accumulated detritus of 40 years of terminals and programmer conventions, keeping things tidy, sacrificing power for approachability.

This said, there are glaring faults with Archy as currently available. First, having users hold down alt and capslock while typing is clearly not going to fly, as it’s difficult for even expert users, let alone the majority of users that have trouble touch typing. Yes, the purpose is to eliminate modality, but this is clearly not a realistic solution. (Maybe we need special keys below our space bar which we can hold while typing more naturally, or maybe the spacebar itself could be used.)

Really, modality is not as bad as UI orthodoxy claims. Keep the number of modes few and try to eliminate unnecessary ones, surely, but modality is extremely powerful, and to do it right, you really just have to give unmistakable visual, auditory, and feedback cues to the user conveying what state the interface is in. (A good test is to see what happens when users step away from the computer then come back later; if they mistake the mode they’re in, you have a problem.) Archy’s cues are just too damn subtle. For instance, I found myself confused when I started typing because sometimes Archy would move my cursor to some other document; turns out I was trying to type in a ‘locked’ document, but it took me a number of tries to figure this out as the message Archy briefly flashes is black text superimposed over very similar black text at the top of the screen; it’s great they didn’t annoy me with an ‘OK’ pop-up, but there’s unobtrusive and then there’s covert. In general, Archy has far too few elements of on-screen guidance, likely because the developers are too enamored of the idea of an interface-less interface. At the very least, Archy should have a training mode in which at least some screen real-estate is devoted to guiding new users.

Also problematic is how Archy messes with the user’s expectations about character keys, giving the keys surprising behavior in certain contexts such that they don’t always produce their usual characters. Now, there are obvious cases where average users quickly adapt to character keys not producing their respective characters on screen—games, for instance, often make use of the alphanumeric keys for non-text purposes—but such programs usually clearly delineate between typing mode and non-typing modes. In contrast, Vi, Emacs, and now Archy violate this barrier, messing with character keys in the context of typing text. In Archy, you find yourself in a few annoying moments like you do when first using Vi(m) where it’s not responding like you expect and you just can’t understand why.

Fortunately for Archy, both of these faults—inadequate cues and poor key assignments—can be fixed, but I wonder if the project will be willing to do so, as it requires compromising on its ideals of a totally modality-less and interface-less interface.

Tipping as a replacement for micropayments

20 May

A while ago, I proposed a website for funneling donations from ‘content consumers’ to ‘content producers’, and it turns out others have a similar idea and are doing something about it but framing the idea as ‘tipping’ rather than ‘donating’. Nick Szabo, who wrote the best early assessment of why micropayments won’t work back in 1996, comments on this, suggesting that the social aspects of real-world tipping should be embraced in their online form:

I’d add generosity signal features that inform one’s friends or fellow tippers as well as the tipee about the tip. This could be in the form of aggregator “karma” points that name and ranks the most generous tippers. This would be like the “karma” points which people who add content to a social aggregator compete for, but it signals far more — it signals that one is a generous tipper as well as a generous contributor of recommendations. There are a variety of other ways (home or facebook pages, e-mail, etc.) that generosity signals might similarly be sent within a social circle.

My gut objections to this are that:

  1. It’s tacky to make a public show of one’s generosity.
  2. I’m already increasingly overwhelmed by noise drowning out my incoming and outgoing signals. Information on the tipping habits of others strikes me as just more minutia competing for my attention and distracting others from paying attention to me.
  3. We don’t need more markets for buying attention and prestige. Plenty of those already.

I’m not so concerned if we’re talking about keeping information about each user’s personal tipping behavior strictly within that user’s personal group of people they choose to associate with, not the random mass of people who happen to also use the service. That is obviously useful and just a natural extension of what happens offline.

I am concerned, though, about emphasizing the aggregate behavior of the whole user base, such as is done with mass-actor networks like Digg and Reddit; though I use those sites, I dislike how the form—a single feed, basically—turns the affair into a competition between factions for control. If designed poorly, the social mechanisms of a donation/tipping network could result in the same distortion and gaming effects.

Unlike Wikipedia, where the goal is to discourage factionalism, a donation/tipping network is best served by simply allowing different factions to go their separate ways. I don’t really see this as a loss: you just aren’t going to find much ground of agreement between crunk enthusiasts and Parrotheads.

Essential Javascript

4 May

From the creator of JSLint, Douglas Crockford, here’s a series of Javascript video lectures. It just so happens I’ve been spending the last four months working in Javascript for the first time, and I wish I’d seen these lectures before I started.

The first thing Crockford tells you is that all the Javascript books out there—with the partial exception of the O’Reilly rhino book—are crap, and I’ll corroborate that. Crockford will save you a great amount of time by cutting directly to all the features, quirks, virtues, flaws, and subtleties of Javascript that make it unique, while giving you invaluable advice on best practices. Also check out his videos on the DOM.

The only issue I take with Crockford is his estimation of the language. He’s certainly correct that Javascript is far under-appreciated by ‘real programmer’ snobs and that Javascript’s biggest weaknesses lie in the browser development environment, not really in the language itself. But I’m not convinced, for instance, that prototype inheritance is really the way to go; it’s perfectly possible to have modifiable-at-runtime inheritance with plain-old-classes, and in fact Ruby and Python offer just that. Nor do I think Javascript should get too much credit for many things it does right: for just about everything Javascript gets right—with the notable exception of function literals—Python gets more right, and did so before Javascript did. Moreover, Python fixes many things that are just plain wrong with Javascript, such as Javascript’s lack of a proper file-linking mechanism.

On the other hand, Python’s built-in namespace has the same problem as Javascript’s global namespace: a mis-typed identifier can’t be flagged as an error by the parser because the parser can’t know if that name exists in the global namespace or not. Sure, getting rid of the global namespace wouldn’t catch mistyping ‘bar’ in ‘foo.bar’, but it would catch a significant number of such errors, and nothing really would be lost. (I don’t consider modifying the built-in namespace to be a legitimate Python practice, and I’ve always found Python’s meta-language features—the modifiable built-in namespace and operator overloading—to be invitations to bothersome errors and quixotically at cross-purposes with Python’s philosophy of imposing stylistic uniformity.)

dtm-ufth_dif2grok-pmdw3

4 May

Title translation: ‘Don’t tell me you find this difficult to understand purple monkey dishwasher (version 3)’.

Preston Gralla on O’Reilly Net complains that Linux package names are preventing wider Linux desktop adoption. While I find his claim that Linux will never get there extreme, I do agree this is a significant hindrance.

Such package names simply shouldn’t be presented to regular users at all, even in the context of browsing packages. Sure, people can just ignore them, but don’t underestimate the psychological toll of being confronted with a stream of information you can’t even categorize let alone understand.

More generally, I oppose the Unix/old-style-programming practice of privileging ease-of-typing and compactness in names over descriptiveness. Even for those attempting to verse themselves in the lingo, the level of contextual familiarity presumed by this preponderance of abbreviations makes the learning curve very steep.