Jul 122007

A learner’s guide to the terminology and concepts of software build processes.

What’s the difference between an assembler, a compiler, and an interpreter, and what’s a linker?

Tower of Babel

Assemblers

Let’s start with the clearest case. An assembler is a program which translates ‘assembly language’ code into processor instructions (a.k.a. ‘machine instructions’/'machine code’, a.k.a. ‘native instructions’/'native code’). What’s assembly language? ‘Assembly’, ‘assembler’, or ‘asm’ for short, is the generic name given to all low-level languages. Now what’s a low-level language? Well, whereas in high-level languages, each line of source code typically translates into more than one processor instruction, in an assembly language, each line directly corresponds to one single processor instruction. Assembly offers the programmer exact control: what you write is exactly what gets executed, instruction-by-instruction.

Because different processors understand different sets of instructions, the assembler language you use must be particular to the processor platform you intend to run your program on. For instance, if you are targeting a processor that uses the x86 instruction set (which includes Intel and AMD processors), then you would use an x86 assembler.

So why write assembly? On the downside, writing your code one processor instruction at a time is far more tedious than writing the functionally equivalent code in a high-level language. Moreover, assembly language can’t protect you from even the most basic errors and allows you to do dangerous things like trying to read memory that doesn’t belong to your program (something which the OS and the processor conspire to stop your program from doing by halting your program when it tries to do such things). So not only is programming assembly like using tweezers to move a hill of sand, the tweezers are slippery and sharp. Producing complex, reasonably bug-free programs entirely in assembly is very hard and generally just hasn’t been done since the late-80’s.

On the upside, the exact control provided by assembly allows for optimizations simply not possible in high-level languages. While compilers and interpreters have gotten quite smart, they very, very rarely, if ever, produce the fastest possible code, leaving room for a human to do better. Again, writing a program entirely in assembly is simply too impractical given the size of most modern programs; however, if a key portion of your code is a bottleneck, it might be beneficial to rewrite that piece of code in assembly and then invoke it from your high-level language code.

Assembly retains one other important role. Some important processor instructions will never be generated by the output of a high-level language, so it is left to assembly code to allow access to those instructions. For instance, on most processors, system calls can only be invoked using a particular instruction, but there’s nothing you can write in C code which will make the C compiler spit out that instruction—it’s simply something (consciously) missing from the semantics of the language; therefore, to make a system call in C, a piece of assembly code that uses the system call instruction is written in a way that the code, when assembled, can be invoked from your C code. For instance, when you open a file in C with the C standard library’s ‘fopen’ function, depending upon your implementation of C, that function either calls a function written in assembly or is itself written in assembly, and that assembly function contains the instruction to invoke the system call that opens a file. (A ’system call’ is a function provided by the operating system that can’t be invoked like a normal function because it exists in the operating system’s protected memory space; the OS and processor conspire to protect this memory space from direct access by ordinary programs because otherwise it would be possible for ordinary programs to bring down the whole system out of incompetence or do malevolent things like read files they aren’t supposed to be able to access. So, processors typically provide a system-call-invoking instruction which allows ordinary programs to invoke code at OS-defined specific addresses in the OS’s protected memory space. By allowing the execution of ordinary programs to enter this memory area only at specific points, the OS can prevent any funny business.)

Assemblers used to be a much bigger deal back in the DOS days when most programmers worked in assembly, but those days are gone. Today, assembly work is rarely done except by developers of operating systems and device drivers, and whereas there used to be many assemblers for Intel-compatible processors, today there are only a few real options (on the upside, they are all now free downloads):

  • MASM (Microsoft Macro Assembler)
  • GAS (GNU Assembler)
  • FASM (Flat Assembler)
  • NASM (Netwide Assembler)

Aside from these options, some C compilers feature mechanisms to embed assembly code amongst the C code. For instance, the C compiler in the GCC (GNU Compiler Collection) allows you to embed GAS assembly code using a special directive. (Understand, this and similar mechanisms in other C and C++ compilers are not official parts of either the C or C++ languages.)

Now, whereas high-level languages, such as Java, C, or C++, are typically highly standardized, the assembler languages for a particular processor may diverge significantly in syntax, e.g. while most assemblers on the x86 platform tend to follow the syntax established by Intel in its processor manuals (with the notable exception of GAS), they still have many sizable differences.

A high-level assembler is an assembler with some high-level-language-like conveniences thrown in. MASM arguably fits into this category, but the best example is certainly HLA (High Level Assembly), an assembler language originally conceived as a teaching tool.

Compilers

A compiler is a program which translates high-level language code—called the source—into some other form (usually processor instructions)—called the target. Whereas assemblers do basically a verbatim, one-to-one translation—like a translation from English to Pig-Latin—compilers typically have a considerably more sophisticated task—more like a translation from English to Latin. So whereas the whole point of assembly generally is that the programmer controls the exact sequence of instructions, compilers only guarantee that the code they spit out is functionally equivalent to the semantics expressed in the source. Moreover, compilers generally attempt to optimize the code they produce, making the end result correspond even less directly to the source.

Just as assemblers are particular to the precise assembly syntax they can translate, compilers are specific to the high-level language(s) they can translate, i.e. a compiler for the C language can translate C code but not Pascal code. Also like assemblers, compilers are particular to the processor platform(s) which they can target (except some compilers don’t spit out processor instructions at all but rather some kind of ‘intermediate code’, as I’ll discuss later).

Consider the case of the C language. Like with assembly, there used to be a wide variety of C compilers used back in the 80’s and 90’s, but today the market has sorted out, and there are only a few notable C compilers. The two most important are:

  • GCC (GNU Compiler Collection): Originally called the GNU C Compiler, GCC now supports many languages other than C and C++. GCC can target dozens of processor platforms, including all the most popular ones.
  • Microsoft Visual C++: Despite the name, Visual C++ supports C as well as C++. Visual C++ only targets the Intel-compatible platforms: x86, x64, and Itanium. (Technically, ‘Visual C++’ is actually the name of Microsoft’s IDE (Integrated Developer Environment), but there isn’t a more commonly used name for Microsoft’s C or C++ compilers.)

Linkers

The source code of all but the smallest programs is written spread across multiple files, and in most languages, these files are treated as separate ‘compilation units’, i.e. they are compiled independently of each other. When a compiler produces processor instructions, the resulting code is called ‘object code’, and the resulting files are called ‘object files’. While some operating systems, including Unix systems, will allow an object file to be run as a program (i.e. it will happily load the file and begin execution of its instructions), this is of limited use because, to make a complete program, the object files need to be ‘linked’ together:

In a program, the code in one source file makes a reference to code in other files and/or is referenced by code in other files: a program is a web of source files which make external references to each other, and so the source files depend upon each other. (If a source file does not reference other files and itself does not get referenced by other files, then it can’t have any effect on or be affected by the rest of the code, so it can’t be said to be a part of the same program.) Still, each source file is compiled separately, meaning that, when processing one source file, the compiler has no knowledge of the files referenced by the source code; consequently, when the compiler encounters an external reference in the source code, all it can do is leave a ’stub’ in the object code allowing the connection to be patched later. Patching together the external reference stubs of one object file to another is precisely the job of a linker. It is the linker that takes many object files and produces from them an executable file (e.g. an .exe file on Windows).

Interpreters

Whereas assemblers and compilers translate code into other forms of code, an interpreter is a program that translates code into action, i.e. an interpreter reads code and does what it says, right then and there. If you intend your program to be run via an interpreter, then every user must have both your program and the interpreter to run it, and your program is then started by starting the interpreter and telling it to run your program. (This may sound unfriendly to naive users, but the installation and starting of the interpreter can be disguised from users such that they install and run your program like any other.)

Because interpretation happens every time you run the program as you run it, interpretation introduces a significant performance overhead. This cost can be mitigated using what I call the ‘hybrid model’. First, the source code is compiled into some intermediate form (i.e. code which is more like processor instructions than high-level code but which is not executable by the processor), and then, to run the program, an interpreter executes this intermediate code. (In this model, the linking of the compilation units is typically done by the interpreter every time the program is run.)

A further refinement of the hybrid model is to use a JIT (Just-in-time) compiler. You use a JIT compiler as you would an interpreter—you run your program by feeding the JIT compiler some form of code (usually intermediate code)—but the JIT compiler compiles code into processor instructions and runs that instead of interpreting the code. Despite the time spent to perform this compilation (typically reflected in a longer program load time), JIT compiling is usually considerably faster than using interpretation: using a JIT compiler with the hybrid model is typically only 10%-20% less performant than were the code ‘natively compiled’ (compiled into an executable and run as such), compared to 70-100% slower for interpreting intermediate code. [The term "performant" is used by programmers to mean 'fast performing' or 'acceptably performing', but you won't find it any dictionary---yet.] Some claim that, in a few cases, a sufficiently smart JIT compiler can run code faster than the same program compiled into an executable because the JIT compiler can make optimizations only discoverable at runtime. (The comparative performance of JIT compiling versus native compiling is a hotly debated topic. While most concede native compilation almost always produces better performance, it’s debated how much of a performance hit JIT compiling introduces.)

Understand that, whether using the hybrid model or not, an interpreted program is limited by its interpreter. Just as programs executed by the OS can only do what the OS allows them to do, interpreted programs can only do what their interpreter allows them to do. This has potential security benefits: as the theory goes, users can download programs and run them in an interpreter without having to trust those programs because the interpreter can block its programs from accessing files on the system and/or using the network connection, etc. In such schemes, the interpreter is often called a VM (virtual machine) because, as far as the programs which it runs are concerned, it looks and acts much like a full computer system. In practice, truly secure virtual machines aren’t quite a reality, for real VM’s have bugs which malicious programs they run can exploit to breach the limitations imposed by the VM; consequently, users should still be careful of which programs they download and run, even if the program is run in a VM.

Another often-cited benefit of interpretation is that, as long as an appropriate interpreter for your language exists on all the platforms you wish to run your program on, you only need to write the program once. This is often called ‘write once, run anywhere’. This argument made a bit more sense when computers were slower and so compilation took considerably longer, making compiling your program for all target platforms a bit more bothersome, but aversion to this inconvenience doesn’t really explain why interpreted programs are considered so much more portable. The real reason writing your program for an interpreted environment makes it generally easier to get it working on multiple platforms is that the interpreter acts as a layer of indirection between your program and the OS, so the interpreter can handle the messy particulars of dealing with variances between OS’s, e.g. the process of opening a file often differs from one OS to the other, but your program only has to tell the interpreter to open a file, and the interpreter in turn deals with the particulars of the OS.

The portability advantage of interpretation holds out as long as your program uses functionality that is available and works consistently on all of your target platforms. A notorious problem area is GUI’s (Graphical User Interfaces): many GUI widgets (windows, menus, scrollbars, drop-down menus, etc.) simply don’t look and act the same on Windows, Macs, and Linux desktops. Attempts to provide a cross-platform means of writing GUI code have to date only been partially successful.

In principle, any language can be either interpreted or compiled, but in practice, languages are designed with a particular model in mind. For instance, were you to interpret C language code, you would defeat the purposes of using C in the first place (mainly performance and greater machine control), and so this just isn’t done (though I bet someone somewhere has done it—someone somewhere has done everything, no matter how strange or daft). Another language, Java, was conceived and implemented to use the hybrid model; ‘native compilers’ (compilers that spit out processor instructions) for Java exist, but aren’t used very often because the performance benefits generally aren’t significant enough to be worth the downsides.

Thus endeth the lesson.

Posted by Brian Will
Jul 042007

Alex Miller, Steve Yegge, and this poster explain.

Among the reasons given:

  • Singletons are most commonly used as excuses to have global variables and functions.
  • As Steve puts it, “using the Singleton is usually just a sign of premature optimization…” .
  • Singletons make it difficult when later you decide you actually need more than one of that type or subtypes.
Posted by Brian Will
Jun 272007

A learner’s guide to the very important concepts of ‘arrays‘ and ‘associative arrays‘ and the very confusing, overlapping terminology thereof.

In programming, the term ‘array’, in its most general sense, means ‘a sequence of units of data’, but confusingly, a preponderance of terms all fit that same definition, each with its own variation on the theme. This wouldn’t be so bad if programmers and programming languages could decide amongst themselves which connotations belong to which terms, but in truth, there is no definitive usage that keeps them all straight. At best, the various different things called ‘array’ can be classified by a few properties:

  • Is the number of elements (a.k.a. the units of data) in the array fixed at the array’s creation time, or can the number of elements grow and/or shrink after creation?
  • Are the elements homogeneous (all of the same kind) or heterogeneous (of different kinds)?
  • Are the elements contiguous in memory (i.e. do the elements all sit directly adjacent to each other)?
  • Do we care about the order? While the elements of an array are always indexed numerically (i.e. each element has a place in line relative to the others), we may simply want to use an array as a collection of things without regard to the order of its elements.

In any case, here are the strongest meanings of each term as best as I can piece together:

  • The dominant use of the term ‘array‘ itself comes from the feature called ‘array’ in the C language and languages strongly influenced by C (which includes C++, Java, and C#). In this usage, an array is an ordered, homogeneous, contiguous, fixed number of elements. While not being able to mix different types of elements together in one array and not being able to add additional elements to an array after creating it makes these arrays bothersome to work with, C’s arrays purposefully forsake these features for performance and memory-usage advantages: by being homogeneous, fixed in length, and contiguous, a C array takes up a minimal amount of memory and generally requires less processing work to access and/or modify its elements. (As it happens, it is from the C language that the convention began of indexing the elements of an array starting from 0 rather than 1, and most of today’s languages stick with that convention even though it initially feels unnatural to students of programming.)
  • A ‘list’ is an array which is not fixed in length: elements can be added to the end of the list or inserted or removed at any point in the list. Lists are not necessarily homogeneous nor necessarily heterogeneous: in most languages, you can create either kind of list. By allowing growth after creation, a list is generally more expensive performance- and memory-wise than C-style arrays; for instance, consider if you want to add elements to the list but the memory space at the end of the list is being occupied by some other data: to accommodate more elements, the whole list would have to be moved elsewhere where there’s more space, something quite expensive to do. (As I’ll describe in a follow-up post, there are two basic ways to implement lists, called ‘array lists’ and ‘linked lists’, both with their own performance trade-offs.) In languages where maximizing performance and memory conservation is not a primary design goal of the language (this includes Python, Ruby, Perl, and Javascript), lists are used in place of arrays for their flexibility; in C, C++, Java, or C# programming, however, lists are typically only used when really necessary.
  • There are a number of interchangeable synonyms for ‘list’, including ‘dynamic array’ and ‘growable array’.
  • A ’set’ is a collection of things in which no element can be the same as any other of the collection’s elements. You could simply use an array or list as a set, but if you then want to make sure no elements are ever found more than once in the array or list, you would have to write logic that enforces that rule when an element of the array/list gets added or modified.
  • The term ’sequence’ doesn’t have any predominant use, but it is sometimes used as a generic term for an ordered collection. Some languages co-opt the term for some particular context, e.g. Java has a notion of ‘character sequences’ in its libraries, and Python classifies some of its types as ’sequences’.
  • The term ’string’ is virtually always used to mean an array of characters, a.k.a. a piece of text data. However, ’string’ is very, very occasionally used in a more generic sense to mean a homogeneous sequence of some type other than characters. (I’ve seen this usage in the context of assembly programming, but not to my recollection in the context of high-level languages.)

An ‘associative array’, though also a kind of data collection, is actually a rather different thing than an ‘array’ or any ‘array’-like thing already discussed. [Hereafter, I will usually use the synonym 'dictionary' for 'associative array', as it avoids confusion with 'array'.] Each element of a dictionary is comprised of two pieces of data, one the ‘key’ and the other its associated ‘value’, together called a ‘key-value pair’. It isn’t necessary for either the keys or values to be homogeneous in type, and it’s perfectly fine for two or more values to be identical, but no two keys can be identical. The idea is that, while the elements of an array are located in the array by numerical index, the elements of a dictionary are located in the dictionary by key: we store a value in the dictionary by associating it with a key, and then we retrieve it from the dictionary by asking for the value associated with that key.

Probably the most commonly used type of dictionary is one with text strings for keys because it’s just very useful to be able to store and retrieve data by some meaningful bit of text, e.g. I could store people’s ages by their names:

  • key: “John Lennon” value: 67
  • key: “Paul McCartney” value: 65
  • key: “Ringo Starr” value: 67
  • key: “George Harrison” value: 64

Now to look up George Harrison’s age, I ask the dictionary for the value associated with the string “George Harrison” and get back the integer 64.

Again, any kind of object can be used for a key. While text strings are most commonly used, I could also use integers, e.g. I could store the names of people by their ages:

  • key: 67 value: ["John Lennon", "Ringo Starr"]
  • key: 65 value: ["Paul McCartney"]
  • key: 64 value: ["George Harrison"]

(We account for the possibility of multiple people having the same name, so we store our values as arrays of strings (as indicated by the [ ] syntax) not just individual strings.)

If you’re going to look people’s ages up by their names and look their names up by their ages, then it might actually make sense to have both of these dictionaries even though it means storing the data twice over. If I only had a dictionary of age-by-name, looking up names by age would require creating a new list and then checking every element of the dictionary, adding to the list each name associated with the age I’m looking for. If my age-by-name dictionary is very large, this would make looking up names by age much more expensive performance-wise than if I had a names-by-age dictionary to use (as I’ll explain in a later post, dictionaries are almost always implemented in a manner that makes finding values-by-key very fast).

Now, if you’re going to associate values with integers, why not just use an array? Well with an array, if I have an element at index 78, then I must also have places in memory for indexes 0 to 77 whether I use those indexes or not. In contrast, a dictionary typically only takes up little more memory than is needed to store all its elements (again, as I’ll discuss in a later post).

Understand that, even if a dictionary has integers for keys, it is still considered to be ‘unordered’—there is no first element, no last element, no in-between elements—each element is the same as any other as far as “position” in the dictionary is concerned. In practice, of course, the key-value pairs sit in memory in some order, but if you cared about that order, you would use an array instead. Most implementations of a dictionary provide some means of getting an array of all the dictionary’s keys, thereby allowing a way to iterate over every value in the dictionary, but the order of the keys in this array produced from the dictionary is random.

You might be wondering why keys must be unique. It’s true that allowing multiple keys could be useful, e.g. if I ask for the value associated with key x when there are multiple keys x, I could get back an array of all values associated with x. Such dictionaries don’t exist because:

  1. If I want to change the value of key x, I would somehow have to specify which key x I meant.
  2. It’s conceptually simpler to pack together all the values you want to associate with key x into an array and then associate that array with unique key x.
  3. Unique keys make the implementation simple and efficient.

‘Dictionary’ is just one synonym for ‘associative array’; like with ‘array’, there is a preponderance of synonyms and near-synonyms for ‘associative array’, including:

  • dictionary: A straight synonym and the preferred term of Python programmers.
  • table: Basically a straight synonym for ‘associative array’, though be careful that ‘table’ is just as often used by programmers to mean a ‘database table’ or a table of information (like a row-by-column chart of figures in a document—not really a programming concept, but a lot of code deals with presenting such tables to users).
  • string table: Like ‘table’, but implies that all the keys are strings and possibly that all the values are strings too.
  • lookup table: A straight synonym in general use and probably the least ambiguous term you could use other than ‘associative array’ itself.
  • map: A straight synonym and the preferred term of users of some languages. In C++, ‘map’ implies an associative array in which the keys are kept sorted (the criteria of how to sort the keys must be supplied by you when you create the map, for the map doesn’t necessarily know how to sort the kinds of objects you supply for keys).
  • hash, hashtable, hashmap: Basically all synonyms for ‘associative array’ except the ‘hash’ part refers to a technique used in implementing associative arrays (again, something I’ll discuss in a later post); just be clear that the terms ‘hash’ and ‘hashing’ are not exclusively associated with associative arrays, as hashing is a fundamental technique used in many areas of programming.
Posted by Brian Will
Jun 132007

Yahoo’s Javascript guru, Douglas Crockford, has another excellent video talk (watchable in-browser or as a download), this time a survey of software engineering titled “Quality”. While general pontifications of this nature are common, Crockford’s strikes a nice balance between breadth and concision and between correctness and novelty (not too dull, not too narrow, etc.), and, in fact, the talk would be quite watchable and interesting for neophyte programmers and perhaps even non-programmers. On the downside, Crockford doesn’t really give prescriptions, but that follows from his main point: we still haven’t really solved or mitigated some hard problems (and perhaps we never will); for now, the only real consensus we have is that you’re better off aware of these issues than not.

Posted by Brian Will
Jun 042007

How bits represent information and form the basis of computing.

An installment in a series of posts on basic computing concepts for beginning programmers.

As the general public has come into daily contact with computers, people have been disabused of their former notions that computers ‘think’ and ‘know’ things. Sadly, for most of us, the mystifying metaphor of human thought has not been replaced by some better conception of how computers work. While many people have begun to correctly think of computers as merely electrical and mechanical devices, not only do most people remain ignorant of how all the ‘gears’ work, they still can’t fathom how a computer could be made up of anything like gears, whether electronic ones or otherwise. And while we have been told to think of computers as ‘machines that do math’, most of us can’t fathom how math transforms into pictures, audio, video, user interfaces, games, or even just text.

To demystify the greater part of how computers work, you don’t have to learn all too much about computer hardware or electronics, for while these subjects are fantastically and fascinatingly complex in their own right, the role of computer hardware essentially comes down to performing a handful of simple tasks when instructed to do so—copy , add, subtract, and compare data, etc; most of the complications in hardware have to do with getting the hardware to do its simple job faster.

So it is the sequence of instructions—the software—fed to the hardware which explains the better part of the story, as this is what turns computers from calculating automatons into useful and seemingly intelligent devices. To explain software, the best place to start is in how bits represent information, for a piece of software is ultimately just a bunch of instructions and data expressed as bits, and manipulating bits is at heart all hardware does. So yes, sadly, every discussion of programming must begin with the subject of data, a topic as profoundly uninteresting as reading the phone book. Only severe autistic cases get into programming to manage data (no offense, serious autistic cases!) — the rest of us want to get our computers to do something, so do we really have to talk about something basically inert? Yes, quite simply because reading and writing data is exactly how computers do anything.

What is a bit?

As everyone these days knows, a bit is simply a thing that holds one of two states (represented with the symbols ‘0′ and ‘1′) and which can alternate between these two states, so any computer data is a series of bits, e.g. 00101111101011101111111100110100. The actual physical mechanism of ‘holding a bit state’ varies from one computer technology to another—memory chips use either capacitors or transistors; optical discs (CD’s and DVD’s) use microscopic grooves read and written by lasers; floppy disks and hard drives use charges on magnetically sensitive surfaces—but really, a bit is just an abstraction, not any particular tangible thing, so in fact, bits can even be found outside of computers, e.g. a flag on the side of a mailbox can be considered a bit because it holds one of two states, up or down.

The simplicity of bits is what makes them a good, universal, lowest-common-denominator representation of data. In fact, a bit is the smallest unit of information possible: you might think that something which held only one state would be the smallest unit of information, but you would be wrong because such a thing would not convey any distinctions, and without distinctions, you have no semantic content (see here).

Quantities of bits

Before discussing exactly how bits represent complex information, we should clear up some confusion around the terminology for expressing quantities of bits:

A single bit by itself can’t represent much, so we usually concern ourselves with series of multiple bits, and certain quantities of bits have names:

byte = 8 bits
nybble (or nibble) = 4 bits

The term ‘nybble’ is used quite rarely, but ‘byte’ is used perhaps even more frequently than ‘bit’.

(A byte is actually not always 8 bits: properly speaking, the size of a byte for a particular system refers to the size of ‘the smallest addressable unit of memory’ (i.e. the size of the cells into which memory is divided up; it is these cells which can be independently read and modified). The memory of some systems, especially some older ones, is divided into cells of some size other than 8 bits, but 8-bit bytes are found in almost all systems made in the last 30 years, including PC’s. There’s nothing intrinsically special about the quantity 8, except it has the virtue of being not too big and not too small while also being a power of two.)

We use Greek prefixes to indicate bits in certain quantities of powers of ten:

1 kilobit (Kb) = 10^3 bits = 1,000 bits
1 megabit (Mb) = 10^6 bits = 1,000,000 bits
1 gigabit (Gb) = 10^9 bits = 1,000,000,000 bits

…well, not quite. These are the popular (read, lazy), rounded-off definitions. The stricter system used by computer scientists and programmers defines these quantities in powers of two:

1 kilobit (Kb) = 2^10 bits = 1,024 bits
1 megabit (Mb) = 2^20 bits = 1,048,576 bits
1 gigabit (Gb) = 2^30 bits = 1,073,741,824 bits

(Actually, programmers very often use the lazy definitions in informal contexts—just don’t think a computer won’t notice the difference.)

The Greek prefixes can be used to indicate quantities of bytes as well, such as saying ‘one kilobyte’ to mean 1,000 (or 1,024) bytes.

Pay particular attention to abbreviations for whether the ‘b’ is capitalized or not: lowercase ‘b’, means ‘bit’, but uppercase ‘B’ means ‘byte’ e.g. ‘1 Kb’ is 1,024 bits, but ‘1 KB’ is 1,024 bytes. If you’re not paying attention, you could misinterpret a quantity of bits by a factor of eight! (You’ll also see the ‘k’, ‘m’, and ‘g’ in lower case, but this doesn’t have any significance.)

For some obscure reason, when talking about quantities of stored data, the convention is to use bytes, kilobytes, megabytes, and gigabytes, but when talking about data throughput (such as in the context of data transfer rates over a network or between computer components), the convention is to use bits, kilobits, megabits, and gigabits.

Character sets

So how do bits represent information humans care about? Well in the case of text, the relation between a particular string of bits and a text character is arbitrary, e.g. we could decide that the bit string 10111000 should designate the Roman character capital ‘J’, and as long as all of our hardware and software in the correct contexts treated that bit string as if it represented ‘J’, then it doesn’t matter that there’s no logical reason for doing so.

So to represent text as bits, we designate a unique string of bits for every character we wish to use, and this set of designations is called a character set. The most widely used character set in the Western world is called ASCII (American Standard Code for Information Interchange), which contains 128 characters, each mapped to its own 7-bit string, e.g. 1001101 in ASCII represents upper case ‘M’ while 1100010 represents lower case ‘b’.

For decades, virtually all programs written for English-speakers have used ASCII, but because ASCII doesn’t contain characters needed in other cultures, other locales used alternatives, so for many years, a hodge-podge of character sets prevailed world-wide. This began to change in the 90’s with the introduction of a universal character set, called Unicode. Unicode reserves enough space for 1,114,112 different characters, which means each character is designated a 21-bit string. As old software is being replaced by new software, Unicode is gradually being adopted as the replacement for all other character sets, including ASCII.

1,114,112 is more characters than is needed to contain all the characters of every language in the world (including even Chinese, Japanese, and Korean), and in fact, only a few hundred thousand characters are currently designated in Unicode, leaving many 21-bit strings available for future addition of characters. Some of the characters in Unicode are not language characters at all but rather symbols used for other purposes, such as math or musical notation.

Numbers

Some kinds of information, such as text characters, get by using arbitrary assignment of pieces of information to their representations, but other kinds of data are suited for a logical system. Numbers are best represented using a logical system for a variety of reasons, most obvious among them the fact that there is an infinite range of numbers, so it’s just impossible to give each number an arbitrarily selected bit string; using a logical set of rules for representing numbers allows us to encode as bits any number of any size in a consistent and predictable way.

So what is this logical system for representing numbers? Quite simply, a string of bits—11001, for instance—is a number, but in binary form rather than the decimal form with which you’re familiar: binary is nothing but a numbering system (i.e. way of expressing quantity) that works just as well as decimal. Unfortunately, while people use decimal all their lives, it becomes so ingrained that they can’t see how it works and therefore have a hard time imagining any alternative. Though the details of how binary works are really very simple once you understand them, the concept of an alternative number system is famously hard to convey succinctly and successfully to those uninitiated, so it’s something we’ll gloss over here. For the duration, just take it on faith that there is, for instance, a logical reason why 35 is expressed as the bit string 100011.

Once we have a logical correlation between any bit string and a number, it then makes sense to think of arbitrary assignments in terms of numbers rather than bit strings, e.g. if, in a character set, ‘G’ is assigned to the bit string 1000111, then it can also be said to be assigned to the decimal number which corresponds to that bit string, 71, and this is in fact how we normally think of such arbitrary assignments.

The perfect ambiguity of bits

Whatever manner is used to encode our information as bits, whether logical or arbitrary, it’s important to understand that the meaning of any string of bits is not intrinsic to the bits themselves: the meaning of any string of bits ultimately relies upon agreement between the writer of the bits and the reader as to how to interpret the bits. This is really no different from human languages, where the words of a language only have meaning because of (mostly informal) established agreements between a community of speakers of that language. It bears illustration, though, because this is not how most people commonly think of meaning. Consider:

Imagine I write 7 decimal digits on a piece of paper. Is it a phone number, the population of Milwaukee, or the number of angels on the head of a pin? Now imagine a bank that uses 14-digit account numbers. If I write down a series of 28 digits with no spaces or separators, how many phone numbers and how many bank account numbers do I have? Well, I may have known what I meant at the time I wrote those numbers down, but nothing in the data tells anyone—including myself ten minutes from then—what the numbers mean at all.

This same problem exists in computers. Consider a sequence of bits: 001110100111111110100100. The first thing not discernible is how the bits should be grouped: is this meant to be interpreted as three bytes, or six nibbles, or 5 bits followed by 19 bits, or what? Just as bad, the bits themselves indicate nothing of what kind of data they’re supposed to represent, whether numbers or text or otherwise, nor how that kind of data is encoded.

The lesson here is that, for data to be interpreted correctly, the thing doing the interpreting has to assume the length, encoding, and location of the data, and the only way to make these assumptions correctly is to strictly keep track of where data is placed and what it was supposed to mean when you placed it there.

What ensures then that, say, a file of ASCII text is interpreted as ASCII text and not treated as a bunch of numbers or perhaps as text of a different character set? Nothing! In fact, you can do the reverse: take any file and open it in a text editor to see it as ASCII text; if the file wasn’t intended to be human-readable ASCII text, you’ll almost certainly just see a sea of garbage like:

ïT¬?ì?Rïûê” ïBHïT$4ïz,ì?+ïH?;-t¦@¶ ïL$4ïQ,ï ël$?¤î? ï?ï£$+ + à +~@ï°ìV ?+Bn+K$+B?+K,¦ -+?+ LAâ-\;-|+ïåä? ¦? ;-ëL$?¤îH? ì« ? ël$$ïU ïD

…at least, it would be remarkably surprising if you opened a random file not intended to contain text but found that it happened to contain long sections of English, or any other human language, for that matter (well, actually, many files not intended to be read as text have text data embedded within them, so you will often see some strings of human language in such files). So be clear that, while nothing stops you from reading a piece of binary data as representing some kind of data which it wasn’t intended to be, barring remarkable coincidence, doing so just produces garbage.

Pretty pictures

A very large majority of all programming deals only with number and text data, but bits are also used to represent sexier kinds of data, namely images, video, and audio. At this point, how bits could possibly represent such information may still seem mysterious, so we should at least breach the matter by briefly illustrating how to represent images.

Quite simply, a computer image is just like the big scoreboard-grids of lights at sports arenas except that the individual emitters of light are much smaller than light bulbs, producing a much finer image. A computer-screen image is essentially a grid of discrete light-emitting points, called pixels, so to produce a certain image, we have to get each pixel to emit the right color.

(Of course, monitors don’t contain any light bulbs, but the physical process isn’t important to us, and besides, the actual physical process is very different between CRT monitors—Cathode-Ray Tubes, the bulky monitors that are a foot or more deep in dimension but which are now going out of use—and LCD’s—Liquid-Crystal Displays, the monitors less than an inch deep in dimension that are used in all laptops and have nearly taken over all new monitor sales for desktops.)

The solution, like with characters, is to use an arbitrary assignment: imagine that we establish a mapping of numbers to colors, and imagine that knowledge of this mapping is hardwired into our monitor; if I then feed a number to the monitor, it could set a pixel to the corresponding color. But which pixel does it set? Well, the ‘next’ pixel: a monitor is hard-wired to set the colors of its pixels in a certain order, drawing lines pixel-by-pixel, left-to-right starting from the top of the screen, moving down line-by-line, and cycling back up to the top to repeat the process. So data is fed into a monitor sequentially, thereby drawing the image sequentially, pixel-by-pixel; it’s just done so fast you don’t see the process.

The monitor image is updated from the computer at a fixed rate, usually sixty times a second or more, whether or not the image changes at all. Now, it would be horribly wasteful to have the CPU do this job itself of constantly feeding the monitor data just to show a still screen, so this responsibility is offloaded onto a specially purposed device, the video controller. An essential component of the video controller is its dedicated memory, called the framebuffer, wherein the current image to display is stored, and many times a second, the video controller transmits the contents of its framebuffer to the monitor; left to itself, the video controller can handle feeding the current image data in the framebuffer to the screen all by itself without attention from the CPU. In fact, from a programmer’s perspective, the data in the framebuffer is the image on the screen, and therefore, to change the image on the screen, the programmer simply instructs the CPU to modify the data in the framebuffer, causing a different image to be seen the next time the video controller sends the monitor the contents of the framebuffer.

The details of binary number representation and the ASCII and Unicode character sets will be presented in later posts.

Posted by Brian Will
May 282007

If you’re not already an initiate to programming, you may not be clear what you’re getting into when you set out to learn to program, so here’s an overview.

Most obviously, you must learn a programming language, which comes down to learning four things:

  1. a syntax: The syntax of a language is the set of rules for how code of that language is written.

  2. a standard library: Programming in all languages is in some way modular, i.e. a piece of code can use other pieces of code. Rather than write all of a program’s code from scratch, programmers use pre-existing code from code libraries. Virtually all modern languages each have their own standard library, a library which is included as a default part of the language. There typically exist many libraries for a language, but only the language’s standard library is universally known and used by all programmers of that language.

  3. code idioms: Given a syntax and a standard library, there then exists in a language some informal set of idioms (reoccurring patterns) of how to code in the language which programmers write over and over again. For various reasons, these idioms can’t be expressed as library code, but while they are not built into the language or officially specified in the language specification, every good programmer uses established idioms wherever possible. An idiom finds favor because it has been proven over time to be the best and/or most obvious way of doing something. (I use the term ‘idiom’ to mean a ’small scale pattern’, something which is expressed in a single place in code. In contrast, the term ‘pattern’ is used to refer to ‘design patterns‘, a specific set of large-scale design strategies used in programming that are generally not particular to any language. Design patterns are also a very good thing to learn, but they aren’t a high priority early in your education.) You won’t hear much talk about idioms among programmers because, in all, there just aren’t that many of them in most languages, and they’re fairly natural to pick up.

  4. a set of tools: To write their programs, programmers use tools (programs):

    1. programmers do most of their work typing code in text-editor programs
    2. code typed by the programmer needs to be translated into a form runnable by the computer; this translation is performed by compiler, interpreter, assembler, and linker programs (which of these are used depends upon the programming language)
    3. debugger programs allow programmers to see exactly what happens when their programs run so they can ferret out bugs
    4. profiler programs help programmers test the performance of their programs and see how long it takes for portions of code to execute so that they can see what portions of code perhaps need to be optimized
    5. in some areas of programming, code generators can sometimes help speed up development by doing some of the coding work for you
    6. version control systems help programmers keep track of the changes they make to their code as they develop it, making it easier to coordinate with other programmers and revert back to earlier versions of their code
    7. programs which integrate some or all of the functionality listed above are called IDE’s (Integrated Developer Environments)

(By the way, don’t worry about tool costs: very high-quality tools are freely available for download for nearly all languages used today; in fact, with the notable exception of Microsoft languages—C# and Visual Basic—the preferred tools are the free and open source ones, while commercial tools are increasingly disfavored and are disappearing.)

So the question you probably have now is, ‘Which language should I learn?’ The correct answer is that you should learn more than one:

  • C: I start with C mainly because: 1) it is the lingua franca of programming; 2) learners can understand what’s really going on in the computer when they run their program written in C; and 3) most features of its syntax show up nearly verbatim in other major languages, such as Java. For purely pedantic reasons, C is a language you really must know, even if you never do any serious programming in it—you’re a better programmer simply knowing what programming in C is like. On the other hand, I strongly discourage you from attempting to write anything but trivial programs in C early in your learning phase, for it takes a good amount of programming experience and knowledge to write even mildly-sophisticated end-user programs (such as programs with graphical user interfaces) in C. Once you attain a decent amount of comfort with basic C, you’re much better off focusing on some other language, such as Java or Python.
  • Java is steadily replacing C as the lingua franca of programming. It is an object-oriented language with a very large standard library, so writing programs is significantly easier to do in Java than in C. Java is the most common choice of a first language to teach students today because Java protects programmers from misusing memory. I think this choice is a mistake, however, because it’s important as a programmer to understand what exactly Java is doing for you, and you can’t really understand that unless you study C. Worse, Java has a rather complex set of advanced syntactical features, and the set of tools and libraries typically used by experienced Java programmers is large and complex. These facets make it hard for learners to get up to speed with real-world code examples. Despite these flaws, Java, unlike C, is a good language to attempt writing your first real programs in.
  • Python is a language with an exceptionally elegant syntax which avoids clutter and nasty ‘corner cases’ (meaning situations where the language has to have complex rules to cover all possible cases). Expressing your ideas in Python is arguably easier than in any other language out there, so writing useful programs in Python is as easy as it gets. On the downside, Python code runs very slow compared to C and Java code, so if you want to be a ‘real programmer’ that can write processing-intensive programs, knowing Python is maybe not enough.

I consider these three languages the most essential to learn, not just because of their populatiry but because of how well they suit their roles. The best order to tackle them in is either C then Java then Python or the other way around: if you’re impatient and want to write something neat fairly early in your education, start with Python; if you’re patient and insist on understand how things really work, start with C.

Here are some other languages to be aware of:

  • C# (pronounced ‘C sharp’) is essentially Microsoft’s imitation of Java. Despite what the name implies, it is only tangentially related to C. While an improvement over Java in some ways, it’s really not different enough from Java to really make that much of a difference which of them you learn. (Many programmers will insist you stay away from Microsoft’s proprietary technologies, though there actually is an open source implementation of C# available called Mono.) I recommend Java because it’s the older, more established language with the broader reach and because better free tools are available for it. However, once you get comfortable with Java, taking a look at C# wouldn’t hurt because a lot of software companies out there are Microsoft shops, and most of them will be using C# by now.
  • C++ is essentially C with object-oriented programming features grafted on. The result is messy but not without its virtues. Because of its performance advantages, C++ is used for end-user applications with high performance requirements, notably games. However, if you’re learning programming to make games, you’re kidding yourself if you start with C++: it’ll be years before you’ll develop the skills to make games complex enough to stress a modern system, so you should focus your learning on an easier to use language. As you develop better program-design skills, they will translate to C++ when the time comes; in the mean time, you should broaden your general programming knowledge.
  • Visual Basic: The basic problem with Visual Basic is that it’s really not that basic at all any more. In its origins, VB was a language that was going to bring programming to the masses. Since then, the features of ‘real’ programming languages, like C++ and Java, have been grafted on to VB in ugly ways, resulting in a language which is just about as complex but not as elegant and therefore not used much except for writing ‘in-house’ software (software which is written for the internal use of a company/organization). The joke goes that a serious programmer doesn’t dare publish programs written in VB to the wider public lest he gets laughed at for using VB. Despite its unfair reputation as a toy language, VB can be used to do real work—there’s just little reason to choose it over Java and C# on the merits. (This judgment is possibly just aesthetic bias: I and most other programmers much prefer the look of Java and C# code to that of VB.)
  • Ruby has been stealing glory from Python lately, largely because of the popularity of ‘Ruby on Rails’, a framework for developing web programming (a framework is a glorified, overgrown library). In the end, the two languages are extremely similar such that learning one will make learning the other quite simple. Ruby arguably has a few feature advantages, but Python takes the edge in elegance and carefully thought out syntax. If you want to do web programming, I recommend the Django framework for Python as a replacement for Rails.
  • Perl is like Python’s deformed cousin. It’s a convoluted mess dearly loved by its partisans but inhospitable to first-time programmers. Currently, Perl is in limbo as the long awaited version 6 has been years late in coming; meanwhile, Python and Ruby have been stealing Perl’s niche, and I predict Perl will never recover, eventually fading into perenial also-ran status.
  • PHP, due to the popularity of the web, is the first language of a surprising number of people these days. This is unfortunate because PHP has little reason for existing except to plug a hole that shouldn’t have existed in the first place. In almost all respets, it’s otherwise an undistinguished amalgam of standard language features taken from other languages and inelegantly smashed together. Still, PHP is a popular choice for writing open source web applications, such as the blogging software that runs this site (WordPress) and the MediaWiki software that runs Wikipedia. I find this unfortunate, but PHP had the right features at the right time, and there’s no arguing with good timing.
  • Javascript, despite the name, is a totally distinct language from Java. A Javascript interpreter is embedded in today’s web browsers, so a webpage can include code that makes the page dynamic, i.e. Google Maps is a webpage with Javascript code that updates the page’s content as the user clicks buttons and drags the map. Javascript is a better language than many give it credit for, but it is only really used in the context of the browser, and quite honestly, Javascript is only used at all because it’s the only language the browser runs—you don’t have a choice of other languages if you want to add programmatic features to a webpage. Javascript would actually make a great first language except that it’s tied to the browser, so as a practical matter, learning Javascript requires learning all sorts of messy, browser-specific stuff that distracts from the essence of programming.

Of course, learning to program involves quite a bit more than just learning a language. Additionally, there’s:

  • terminology / way of speaking

  • best coding practices (especially how to write readable code)

  • design (even if you know a language inside out and write meticulously crafted code, you may still be clueless about writing programs in the large; without learning good design practices, your programs will become unreadable and unmodifiable rats nests as they grow in size)

  • a bit of computer science (especially fundamental data structures and algorithms)

  • optimization strategies

  • hardware concepts

  • operating system concepts

A note about this last bullet: whatever the language, the real difficulty of learning standard libraries (aside from the fact that many of them are very large) lies in the fact that their most important parts are heavily intertwined with concepts not particular to the language, e.g. every standard library includes code for reading and writing files, but files are really an operating system concept, not a language concept. Unfortunately, standard library documentation has the tendency to assume the reader is already familiar with these underlying concepts because it assumes an audience of programmers experienced in other languages; when the documentation does make an attempt at elucidating these concepts, it tends to do a half-assed job; even when the documentation explains these underlying concepts well, this is still a suboptimal way of learning because it confuses the concepts with the particulars of that particular library and language. So you should try to learn OS concepts in their own right, independently of any language; there are a few good books about operating system design, in particular, Modern Operating Systems (2nd edition), by Andrew S. Tanenbaum and (for when you’re comfortable with C) Linux Kernel Development, by Robert Love.

This same advice—’learn concepts, not libraries’—applies to a few other domains you’ll encounter, including networking, databases, and web standards, e.g. before devoting a lot of time to learning how to work with a database using Java’s standard libraries, learn about databases independently of any language.

I should end by mentioning ‘Pidgin’, my in-the-works language intended strictly for educational purposes. (Yes, the Gaim people are using the name ‘Pidgin’, but I came up with the name earlier, besides which, it’s a name most appropriate for a language.) My LearnProgramming.tv project has been on hiatus, partly because I’ve been reconsidering its approach ever since I had the epiphany that what’s really needed is a better language for beginners. There’s nothing to show yet, but implementation shouldn’t take long, as the syntax is extremely simple (it uses prefix notation for expressions) and I’m simply translating into Python. If you’re looking to learn to program, please leave a comment—I’m looking for some guinea pigs.

Posted by Brian Will
May 262007

Why natural language in programming languages is a fool’s game.

  1. The main purpose of a formal language is that it is free of the ambiguities found in natural human languages.
  2. While naturalistic language may give off signals of familiarity and thereby boost a programming language’s approachability, the gain is more than offset by the increased complexity which naturalism introduces, complexity which learners will very quickly be confronted with. Naturalism acts as a mask for language complexity, not a substitute for it.
  3. I for one am skeptical of the ‘cult of smartness’ found in programming culture, but even I will say that a learner new to programming just isn’t going to cut it if they can’t adapt to the concept of a formal language: if you’re hanging on to those bits of typical programming languages that happen to be English-like (e.g. control words like ‘if-else’), then you just aren’t getting it.
  4. The usual bits of English-like syntax found in our languages are often more misleading than helpful.

This last point can be illustrated with the keyword ‘while’. The keyword ‘while’ obviously was introduced with a sentence in mind like, ‘Do this stuff while this condition remains true.’ Problem is, this sentence, as understood by the typical English speaker who is not already versed in a C-like language, would not convey what ‘while’ actually does; the sentence only sounds good enough as a description of ‘while’ to programmers already familiar with the ‘while’ construct.

A more accurate sentence for describing ‘while’ would be, ‘If this condition is true, do this stuff, otherwise move on; after every time you do the stuff, test the condition again, doing the stuff if true or moving on, ad infinitum.’ Notice the absence of the word ‘while’. Also notice that the sentence which the originators of ‘while’ did have in mind is not what a newbie to the language is going to see—they will just see the word ‘while’.

Not only is ‘while’ not really evocative of its function, it easily misleads: a very common misconception learners get is that a ‘while’ loop will end the moment the condition is made untrue inside the loop rather than when the block completes and the condition tested again; many learners likely get this misconception because it is consistent with the sentence, ‘Do this stuff while this condition remains true.’ Learners would be better served were ‘while’ instead arbitrarily named ‘friedchicken’ (perhaps ‘kfc’ to save on typing).

Now, of course, not all English-like elements are so poorly chosen—’if’ is pretty sensibly named, though I would have gone with ‘otherwise’ in place of ‘else’—but the acceptable cases are few enough that one can hardly say there is much benefit in attempting to conform to an ideal of naturalism. You’ll likely just do more harm than good.

Posted by Brian Will
May 212007

This last week I’ve been getting familiar with Vim. I’ve dabbled a few times in the past, but this time I’m finally feeling comfortable enough to stick with it. I was quite annoyed with having to hit ESC all the time, but a neat tip is to set this in your config file:

imap jkl <esc>
imap jlk <esc>
imap kjl <esc>
imap klj <esc>
imap ljk <esc>
imap lkj <esc>
set timeoutlen=500   " cuts down the pause time you'll see after typing j, k, and l

…so now you can get out of insert mode by smashing ‘j’, ‘k’, and ‘l’ simultaneously. This will, of course, conflict if you ever need to type one of these sequences, but those are surely quite rare and work-around-able. I’m even going to experiment with reducing this just to ‘j’ and ‘k’, which is itself a really rare sequence in English. (Maybe ’s’, ‘d’, and ‘f’ or just ‘d’ and ‘f’ can be set to ESC as well.)

If you don’t like that solution, other good choices are:

imap <S-space> <esc> " shift-space to get out of insert mode

…and:

imap <S-enter> <esc>  " shift-enter to get out of insert mode

I mention Vim because I recently played with the alpha of Archy, which is a project to implement the ideas of the late interface theorist Jeff Raskin, and its text editing mode feels much like Vi stripped down to the essentials for the common user. Here’s a quick summary of Archy’s interface:

  • The primary part of the interface is one big text-editing window; there’s only one sequence of text, but the text is separated into ‘documents’ by special divider lines (which are inserted by the ` key).
  • Navigation up and down the big glob of text is done primarily by holding down an alt key and typing a string of text to “leap” to. Using left-alt does a back search for that string while right-alt does a forward search. Releasing alt takes you back to typing mode at the location you just leapt to. To repeat your last search, hold capslock and tap alt (right-alt for forward searches, left-alt for back searches). Having to hold down alt while typing is pretty awkward and uncomfortable after a while.
  • This ‘leaping’ feature with the alt-key searching is also how you select text: the selected text portion is always the section between the place you last leapt from and leapt to.
  • The user issues commands by holding down the capslock key, then typing the command. E.g. [capslock]-C-O-P-Y to copy the highlighted text. Even with auto-completion, having to hold down capslock while typing is pretty awkward, especially if the command contains an ‘a’, ‘q’, or ‘z’.
  • Text can be formated (coloring, fonts, style, size, alignment, etc.) by highlighting text and issuing a command.
  • And for some reason, to get a key to autorepeat, you must triple-tap it before holding it down. (This is really annoying, especially with the navigation and backspace keys.)

Text-editing is the only part of the alpha, but Archy’s other key component will be what they call the Zooming User Interface (ZUI), which I take it is where non-textual elements will reside. How these components will fit together, I’m really not sure.

What to make of this? First, I’ll say there are some things I admire about this approach—mainly, Archy bravely recognizes average users can cope with a few non-discoverable elements: as long as the set of basic concepts to learn is small and worth the pay off, it’s OK—contra Steve Jobs—to expect users to learn something. Archy strikes me as an attempt to bring the Unix philosophy to the masses: rather than relying upon packaged solutions to problems, users are encouraged to build upon a base of simple yet powerful mechanisms; Archy simply starts from a clean slate, dispensing with the accumulated detritus of 40 years of terminals and programmer conventions, keeping things tidy, sacrificing power for approachability.

This said, there are glaring faults with Archy as currently available. First, having users hold down alt and capslock while typing is clearly not going to fly, as it’s difficult for even expert users, let alone the majority of users that have trouble touch typing. Yes, the purpose is to eliminate modality, but this is clearly not a realistic solution. (Maybe we need special keys below our space bar which we can hold while typing more naturally, or maybe the spacebar itself could be used.)

Really, modality is not as bad as UI orthodoxy claims. Keep the number of modes few and try to eliminate unnecessary ones, surely, but modality is extremely powerful, and to do it right, you really just have to give unmistakable visual, auditory, and feedback cues to the user conveying what state the interface is in. (A good test is to see what happens when users step away from the computer then come back later; if they mistake the mode they’re in, you have a problem.) Archy’s cues are just too damn subtle. For instance, I found myself confused when I started typing because sometimes Archy would move my cursor to some other document; turns out I was trying to type in a ‘locked’ document, but it took me a number of tries to figure this out as the message Archy briefly flashes is black text superimposed over very similar black text at the top of the screen; it’s great they didn’t annoy me with an ‘OK’ pop-up, but there’s unobtrusive and then there’s covert. In general, Archy has far too few elements of on-screen guidance, likely because the developers are too enamored of the idea of an interface-less interface. At the very least, Archy should have a training mode in which at least some screen real-estate is devoted to guiding new users.

Also problematic is how Archy messes with the user’s expectations about character keys, giving the keys surprising behavior in certain contexts such that they don’t always produce their usual characters. Now, there are obvious cases where average users quickly adapt to character keys not producing their respective characters on screen—games, for instance, often make use of the alphanumeric keys for non-text purposes—but such programs usually clearly delineate between typing mode and non-typing modes. In contrast, Vi, Emacs, and now Archy violate this barrier, messing with character keys in the context of typing text. In Archy, you find yourself in a few annoying moments like you do when first using Vi(m) where it’s not responding like you expect and you just can’t understand why.

Fortunately for Archy, both of these faults—inadequate cues and poor key assignments—can be fixed, but I wonder if the project will be willing to do so, as it requires compromising on its ideals of a totally modality-less and interface-less interface.

Posted by Brian Will
May 042007

From the creator of JSLint, Douglas Crockford, here’s a series of Javascript video lectures. It just so happens I’ve been spending the last four months working in Javascript for the first time, and I wish I’d seen these lectures before I started.

The first thing Crockford tells you is that all the Javascript books out there—with the partial exception of the O’Reilly rhino book—are crap, and I’ll corroborate that. Crockford will save you a great amount of time by cutting directly to all the features, quirks, virtues, flaws, and subtleties of Javascript that make it unique, while giving you invaluable advice on best practices. Also check out his videos on the DOM.

The only issue I take with Crockford is his estimation of the language. He’s certainly correct that Javascript is far under-appreciated by ‘real programmer’ snobs and that Javascript’s biggest weaknesses lie in the browser development environment, not really in the language itself. But I’m not convinced, for instance, that prototype inheritance is really the way to go; it’s perfectly possible to have modifiable-at-runtime inheritance with plain-old-classes, and in fact Ruby and Python offer just that. Nor do I think Javascript should get too much credit for many things it does right: for just about everything Javascript gets right—with the notable exception of function literals—Python gets more right, and did so before Javascript did. Moreover, Python fixes many things that are just plain wrong with Javascript, such as Javascript’s lack of a proper file-linking mechanism.

On the other hand, Python’s built-in namespace has the same problem as Javascript’s global namespace: a mis-typed identifier can’t be flagged as an error by the parser because the parser can’t know if that name exists in the global namespace or not. Sure, getting rid of the global namespace wouldn’t catch mistyping ‘bar’ in ‘foo.bar’, but it would catch a significant number of such errors, and nothing really would be lost. (I don’t consider modifying the built-in namespace to be a legitimate Python practice, and I’ve always found Python’s meta-language features—the modifiable built-in namespace and operator overloading—to be invitations to bothersome errors and quixotically at cross-purposes with Python’s philosophy of imposing stylistic uniformity.)

Posted by Brian Will
Mar 312007

From Wikipedia:

When applied to object-oriented programs, the Law of Demeter can be more precisely called the “Law of Demeter for Functions/Methods” (LoD-F). In this case, an object A can request a service (call a method) of an object instance B, but object A cannot “reach through” object B to access yet another object to request its services. Doing so would mean that object A implicitly requires greater knowledge of object B’s internal structure. Instead, B’s class should be modified if necessary so that object A can simply make the request directly of object B, and then let object B propagate the request to any relevant subcomponents. If the law is followed, only object B knows its internal structure.

More formally, the Law of Demeter for functions requires that a method M of an object O may only invoke the methods of the following kinds of objects:

  1. O itself
  2. M’s parameters
  3. any objects created/instantiated within M
  4. O’s direct component objects

In particular, an object should avoid invoking methods of a member object returned by another method.

I’m not sure how valuable this advice is. If you follow it strictly, you’ll be spending a lot of time adding (and modifying) wrapper methods, which is probably just as bad, complexity-wise because what you gain in direct decoupling is lost by indirect coupling (not to mention interface pollution). Perhaps the better advise is to be conscious when you break the Law, i.e., when you write:

x.y().z();

…you should pause and think whether the class of ‘x’ would tolerate having a method of its own that accomplishes the same work, thus simplifying things for its users. Often, however, a type of object returned by the methods of a class serve too many multifarious purposes, making wrapping its methods quite heavy business and, in fact, more confusing than just leaving the class’s clients to deal with that type directly.

So a good rule is to ask ‘how broad is the scope of the returned type’s uses?’ If the functionality needed of the returned type is significantly narrower than that type’s public interface, consider wrapping it; otherwise, you’re just likely to do more harm than good.

Posted by Brian Will