Archive | January, 2007

Syntax does/doesn’t matter

31 Jan

Syntax doesn’t matter: any good programmer works with multiple languages over their lifetime, most of these languages expressing basically the same ideas in mostly arbitrarily different ways; any serious student of programming will come to the same conclusion once they learn their third or fourth language.

I’ve seen this in another context, trying to learn Arabic. This Slate article explains the difficulties of this undertaking very well, and it points out what I experienced myself: reading the script is the easy part, and actually not all that hard. Excluding the elaborate calligraphies, English-speaking learners of Arabic should only need a month or two to feel reasonably comfortable quickly identifying the characters and reading them chained together. Additionally, it is surprisingly easy to adjust to reading right-to-left: after a short time, the brain simply makes the switch. (Remembering to open books from what seems like the back takes a lot longer.) Programmers develop this same ‘switch’ when working with different syntaxes.

But syntax, of course, does matter on two ends: programmers have to write in a syntax, and learners have to learn a syntax. Really, when people say ‘syntax doesn’t matter’, they mean that programmers—and those learning to program—shouldn’t think in syntax. This is true, for whatever the syntax, the syntax is not going to change the semantics of what the programmer codes, nor is it going to change the core concepts of a language which the student must learn.

My main purpose in proposing a new educational programming language is to give learners a language that expresses the core features common to modern languages in the syntactically most direct and obvious way possible. This principle should be applied to the libraries of the language as well, so for instance, abbreviations that are non-obvious to a non-programmer must be excised from the language, including the libraries, e.g. the language can’t have anything like C’s “stdio” (“standard input/output”) or “sprintf” (“string print formatted”). (Such abbreviations would be more forgivable if full-names were given in learning materials, but out of the dozens of tutorials and references of the C language I have read, only a couple do so.) Taken individually, shortcuts like these may seem to the initiated like small hurdles, but only because the initiated can no longer see how non-obvious such shortcuts are. The small hurdles quickly add up. Every quirk, every historical legacy, every shortcut is one more thing which at some point is going to cause the learner frustration, possibly halting their progress.

To transition learners from Pygeon (my educational language) into C and Java, it therefore makes a lot of sense to give learners intermediary languages, ones which are as free as possible of the quirks, historical legacies, and shortcuts found in C and Java. Call these intermediary languages PygeonC and PygeonJava (calling them CPygeon and JPygeon would imply we are talking about particular implementations of Pygeon, like with CPython and JPython). Syntactically starting from a base of Pygeon, these languages would be directly translatable into valid C and Java (in fact, this would be the simplest and best implementation method).

To give you an idea, I’ll discuss a few of the syntactical foibles of C and how the semantic content behind the foibles might be more obviously (though likely more verbosely) expressed in a Pygeon-derived syntax:

The first question is whether PygeonC would need to introduce the control flow features of C not found in Pygeon (this includes for, switch, do-while). You can program C just as well without these constructs as you can with them, so it seems OK to omit them. On the downside, not including these constructs delays introducing them to students until they encounter real C, but I feel this is the right choice, as PygeonC is meant to introduce learners to the concepts of C, and these constructs arguably are just syntactical conveniences.

Goto and labels, however, aren’t quite just conveniences, as they often open up significantly different ways of expressing some particular logic, so I feel their inclusion is warranted. More importantly, the fine-grained control offered by goto and labels are in the spirit of C—even though their use is best avoided in almost all cases even in real C. PygeonC’s goto statement will look just like C’s, but label’s will be declared with the label keyword, not followed by a colon e.g. label foo. Labels must go by themselves on the line preceding the statement they label.

I’ll discuss more C/PygeonC syntax in my next post, What’s the matter with C?.

Expressions, expressions, expressions

27 Jan

The most common oversight in beginning-programming education is that most instructors and most books fail to emphasize the concept of expressions. From my own learning experience and from talking to classmates, it’s very rare, for instance, for learners to think of the of the function name in a call as the operator and its arguments its operands. Students also fail to grasp the basic idea that all operands are of a type and that an expression, no matter how complex, evaluates into a single typed value.

The benefit to giving learners a thorough understanding of expressions up front is that it then becomes easy in most languages to explain a good number of other rules. For example, when explaining conditional expressions, all you have to say is that it’s an expression whose evaluated value is then interpreted as true or false according to such-and-such rules. You shouldn’t then have to explicitly tell learners that they can include function calls in the expression or (in some languages) assignments. Most importantly, learners can begin to see a conceptual unity in the language.

With this in mind, it occurs to me that Pygeon (my educational language [see previous post]) should use prefix notation rather than the usual infix notation. Pygeon’s prefix notation would differ in a few key ways from that found in Lisp:

  • Pygeon is still statement-based, so control flow (‘if’, ‘while’, etc.) is kept conceptually separate.
  • The outer expression of a statement need not be surrounded in parentheses.
  • Operators will all be designated by name rather than symbol, i.e. the addition operator will be add. Even operators like . and [] are expressed as what appear to be built-in functions, e.g. indexing the fourth member of an array would be written: (member arr 3) .

It follows that, for consistency, function calls should adopt the Lisp prefix style, including the lack of commas between arguments.

The basic benefit to these decisions is that it frees students from having to think about syntax: using named operators and prefix notation may require more parentheses and verbosity than most experienced programmers would want to deal with on an extensive basis, but the advantage for learners are considerable—at the very least, prefix notation frees students from the distraction of an order of precedence. Perhaps more importantly, I think learners can too easily fail to see the conceptual parallel between an operator and its operands and a function and its arguments; the difference between the two really is just that the built-in operators of a language are built-in, use a special syntax for convenience/aesthetic reasons, and are typically implemented differently from function calls to avoid the same overhead.

Of course, learners eventually will move on to languages with the more typical infix notation, so you might object that students shouldn’t be deprived of infix. While I do believe prefix notation lays bare the concepts of expressions in a way that infix does not, I actually think the optimum approach is to present them side-by-side, no matter what the language being taught, for this is an excellent example of where learning more material makes learning easier rather than harder because the concepts are best understood presented in contrast with near-neighbors and alternatives. Prefix notation offers the purer syntax for expressions while infix is the more familiar.

Pygeon: a new educational programming language

14 Jan

One thing I’ve had on the drawing board for a while is a new educational programming language, which I’m calling ‘Pygeon’—pronounced ‘pigeon’, but spelled with a ‘y’ in honor of its Python heritage. The design of Pygeon reflects one key principle: conveniences are confusing for learners, as they cloud the ‘essential design’ of a language with distracting ‘incidental design’. Not only are conveniences unnecessary mental clutter the learner has to deal with, they are like the Kool-aid guy bursting through the walls of the language’s form: they make the form more complex, turning it into a confusing Swiss cheese from which the initiated may be able to still discern the essential form but from which newbies usually can’t.

Now, in a sense, programming languages are themselves ‘conveniences’, so how are inessential conveniences different from essential ones? Well, programming languages are always a form of information compression, e.g. in C, you needn’t write:

int x; int y; int z;

…as you can simply write:

int x, y, z;

Similarly, writing a function is a convenience in that it saves programmers from having to write the machine instructions themselves. The key difference here is that the multiple-declaration syntax convenience just saves you some typing and compacts your code; in contrast, having functions as a built-in abstraction to the language saves the programmer from having to think or even know about machine instructions, not just from having to type machine instructions.

Speaking of functions, Pygeon’s function declaration syntax makes a good demonstration of Pygeon’s adherence to the ‘no conveniences’ rule. A function declaration has this form:

function  name [parameters parameter1 parameter2 ... ]

The body of statements comes indented on the following lines, a la Python.

The optional ‘parameters’ keyword is followed by the (non-comma-separated!) list of parameter variables. Commas are not used to separate the parameter names, for two reasons: first, while C-influenced languages choose to emphasize the symmetry between function-call argument lists and function-declaration parameter lists, I find it’s actually more important to emphasize the distinctions, as one is a list of expressions and the other a list of declarations; second, without type declarations, the commas become unnecessary, and unnecessary syntax confuses learners because they want to believe the syntax serves a purpose they just can’t discern yet.

You may have noticed that the function declaration header does not end with a colon, as Pygeon disallows the body of a block to go on the same line as the header. Nor can multiple statements go on the same line with semi-colons. So whereas, in Python, you can write:

if x > 3: doSomething(); x = 5

…in Pygeon, the same must be expressed as:

| if x > 3
|   doSomething()
|   x = 5

I made these decisions because, in my own learning experience, formatting choices are distracting. Pygeon takes this very strictly and so lacks any mechanism to spread statements across multiple lines. Certainly this is very limiting and not acceptable in a real-use language, but occasional overly long lines are tolerable in an education language and act as a good indicator to the learner that maybe they should split their statement(s) up or that maybe something is non-ideal about a function that takes so many parameters; this also ensures that all Pygeon code only ever uses indentation to introduce a new block, which greatly reduces confusion when reading code for eyes not used to doing so.

Here’s a quick run down of some other design choices in Pygeon:

  • If Python didn’t have ‘elif’, if-else ladders would be a large mess of indentation; reading and writing if-else ladders at one level of indentation is a basic coding skill, so Pygeon includes ‘elseif’, even though it is, strictly speaking, unnecessary and just a convenience. (The C solution, introducing compound statements, is overly clever for something of such specific use, and it brings along strange edge cases that have to be explained, e.g. if I have a single non-compound statement after ‘if’, does that constitute a unique scope?)
  • There is no ‘for’, ‘foreach’, or ‘do while’, only ‘while’. I personally hardly ever find a use for ‘do while’, and though ‘foreach’ is particularly useful, it is just a convenience. Not only do these constructs represent more syntax to explain, they represent distracting choices learners will have to deal with when writing code. Not including them in the language hammers home the idea that only ‘if’, ‘else’, ‘elseif’, ‘while’, ‘break’, and ‘continue’ are needed to construct any possible flow of execution (though ‘elseif’ and, I believe, ‘continue’ aren’t really needed either). For similar reasons, there are no ‘switch’, ‘goto’, or break-to-label constructs.
  • A module’s top-level of code includes, in this order, import statements, functions, classes, and then the setup section (a section of arbitrary code). This reflects the general order in which things happen when a module is loaded: first the imports are executed, then the function and class definitions are parsed and compiled (putting functions first is an arbitrary decision), then the setup code is run. Using this strict ordering is less powerful than with Python style modules in some circumstances, but it cuts off questions about odd edge cases.
  • An ‘import ‘ statement specifies the full name of a file as a relative path (in a slash-direction agnostic manner). The equivalent of Python’s plain ‘import’ is expressed by adding a ‘to’ clause after the file name, e.g. ‘import foo.pyg to bar’; the target specified in the ‘to’ clause is a module-level reference to an associative array object (like a Python dictionary), not a special module object. To achieve the same effect as Python’s ‘from import *’, use the keyword ‘here’, i.e. ‘import foo.pyg here’. (The ‘here’ clause feature will probably just be omitted, as it encourages namespace pollution, and it’s really just another typing saver.)
  • Pygeon has no concept of packages.
  • Pygeon has no print statement, as it provides nothing a function couldn’t do with the simple addition of parentheses.
  • Each file has a module-level scope, but there is no built-in scope. The most basic parts of the standard library are always implicitly imported into every module with ‘import here’, and warnings are issued by the code checker when you reassign these members of the module scope.
  • Functions and classes are always written at the top-level of code (not including constructors and methods which of course go inside their class definitions).
  • Classes in Pygeon have no concept of member access restrictions. Nor, in fact, is there a concept of inheritance or type hierarchy supported in the language—all objects are simply associative arrays that either have some member or they don’t. If you want a class that inherits from another, you’ll have to cut and paste all the parent class stuff into the child. (Obviously this is not acceptable in a real-use language, but it dodges all sorts of otherwise needed bothersome rules, and it emphasizes the true nature of polymorphism: an object can be used in the place of another type of object when it has the necessary members. Besides, it’s a classic beginner mistake to overuse inheritance.)
  • All methods of a class are declared in a ‘class’ block. First goes the optional class setup code (for making static members); second goes the (mandatory) constructor, which is written just like a function but declared with the keyword ‘constructor’ (and given no name); then come the methods, declared with the ‘method’ keyword.
    | class Cat
    |   foo = 3 // static member foo
    |   constructor // an empty constructor
    |   method meow
    |     // do meow stuff
  • Instance members are created via assignment of members to the ‘this’ variable inside the constructor and methods. The instance must always be explicitly specified with the keyword ‘this’, so there is no name collision between local variables of a method and the instance. A method can call another only via this. The class object itself can be accessed in a method by name, and this is how static members are accessed.
  • Pygeon includes exceptions, but there is no Exception type from which all exceptions must derive. The exception mechanism throws an object in which the first element (usually a string) is what the catch blocks look for, e.g. ‘throw ['hi', moreInfo]‘; all other elements of the thrown object are optional and can be used for more information about the error. A catch block header specifies the object to match to (Pygeon first tries an identity match and then tries an equivalence test). The thrown exception object is accessed in a catch block as the keyword ‘ex’ just like it were a variable local to the catch block (obviously ‘ex’ has no meaning outside a catch block). The exception can be repropagated simply with ‘throw ex’. Pygeon has no analog of ‘finally’.
    | try
    |   //...
    |   catch 'hi'  // catches exceptions thrown with 'hi'
    |     print(ex[1]) // print the second element of the exception object
  • When you use a variable in a class, function, constructor, or method, it is taken to be in module scope unless that scope somewhere has an assignment to the same name. The only way to assign to the module scope from within a local scope is via the keyword ‘mod’, a special reference to the current module, e.g. ‘mod.bla = 5′.If you create a local foo, you can still access module-level foo as ‘mod.foo’.
  • After the function and class definitions in the module, you can an arbitrary sequence of statements which is executed when the module is first loaded. Statements in this section execute in the module scope, so they reference the functions, classes, and imported modules of the class directly, and when assigning to a variable here, it is a variable in the module scope.
  • Associative arrays (dictionaries) and indexed arrays (lists) are combined into one kind of object, using [] for the literal syntax, e.g.:
    ['hi', foo:42, 'yo'] // index 0 is 'hi', index 1 is 'yo', foo is 42
  • Unlike in PHP, there’s no weird concept of order given to the non-numeric members of an array, and the indexed members are always accessed via their originally assigned index. Members are accessed with the dot operator, e.g. ‘foo.bla’ (at bla of foo) or ‘foo.3′ (at fourth index of foo); where the index or name of the member is not known at write-time, use a string or integer expression in [], e.g. ‘foo[x]‘ (where x evaluates into a string or integer).
  • Instances are simply arrays, so unlike in Python, Pygeon objects don’t have ‘internal dictionaries’ because they are their own dictionaries.
  • As there is no inheritance, there is no hierarchy of built-in types. The built-in types are: associative/indexed arrays; strings; booleans; and numbers (the language masks the integer/float distinction in the manner of Javascript but with no size constraint). The built-in types do not have any methods because that introduces conceptual difficulties that are hard to explain. Standard library functions provide the missing functionality.
  • There is no casting operator, only conversion functions. Implicit conversions are kept very few.
  • Pygeon dispenses with Python’s operator overloading, as it’s an overly complicated feature. The principle built-in types still retain their most essential operators, but the operators are just built-in shortcuts to standard functions (not methods of the type). There’s no way to use the operators with your own types.

That covers the major highlights. My next post will be about why programming languages are hard to learn, followed by a look at particular languages and what prevents them from being good educational languages.