Pygeon: a new educational programming language

14 Jan

One thing I’ve had on the drawing board for a while is a new educational programming language, which I’m calling ‘Pygeon’—pronounced ‘pigeon’, but spelled with a ‘y’ in honor of its Python heritage. The design of Pygeon reflects one key principle: conveniences are confusing for learners, as they cloud the ‘essential design’ of a language with distracting ‘incidental design’. Not only are conveniences unnecessary mental clutter the learner has to deal with, they are like the Kool-aid guy bursting through the walls of the language’s form: they make the form more complex, turning it into a confusing Swiss cheese from which the initiated may be able to still discern the essential form but from which newbies usually can’t.

Now, in a sense, programming languages are themselves ‘conveniences’, so how are inessential conveniences different from essential ones? Well, programming languages are always a form of information compression, e.g. in C, you needn’t write:

int x; int y; int z;

…as you can simply write:

int x, y, z;

Similarly, writing a function is a convenience in that it saves programmers from having to write the machine instructions themselves. The key difference here is that the multiple-declaration syntax convenience just saves you some typing and compacts your code; in contrast, having functions as a built-in abstraction to the language saves the programmer from having to think or even know about machine instructions, not just from having to type machine instructions.

Speaking of functions, Pygeon’s function declaration syntax makes a good demonstration of Pygeon’s adherence to the ‘no conveniences’ rule. A function declaration has this form:

function  name [parameters parameter1 parameter2 ... ]

The body of statements comes indented on the following lines, a la Python.

The optional ‘parameters’ keyword is followed by the (non-comma-separated!) list of parameter variables. Commas are not used to separate the parameter names, for two reasons: first, while C-influenced languages choose to emphasize the symmetry between function-call argument lists and function-declaration parameter lists, I find it’s actually more important to emphasize the distinctions, as one is a list of expressions and the other a list of declarations; second, without type declarations, the commas become unnecessary, and unnecessary syntax confuses learners because they want to believe the syntax serves a purpose they just can’t discern yet.

You may have noticed that the function declaration header does not end with a colon, as Pygeon disallows the body of a block to go on the same line as the header. Nor can multiple statements go on the same line with semi-colons. So whereas, in Python, you can write:

if x > 3: doSomething(); x = 5

…in Pygeon, the same must be expressed as:

| if x > 3
|   doSomething()
|   x = 5

I made these decisions because, in my own learning experience, formatting choices are distracting. Pygeon takes this very strictly and so lacks any mechanism to spread statements across multiple lines. Certainly this is very limiting and not acceptable in a real-use language, but occasional overly long lines are tolerable in an education language and act as a good indicator to the learner that maybe they should split their statement(s) up or that maybe something is non-ideal about a function that takes so many parameters; this also ensures that all Pygeon code only ever uses indentation to introduce a new block, which greatly reduces confusion when reading code for eyes not used to doing so.

Here’s a quick run down of some other design choices in Pygeon:

  • If Python didn’t have ‘elif’, if-else ladders would be a large mess of indentation; reading and writing if-else ladders at one level of indentation is a basic coding skill, so Pygeon includes ‘elseif’, even though it is, strictly speaking, unnecessary and just a convenience. (The C solution, introducing compound statements, is overly clever for something of such specific use, and it brings along strange edge cases that have to be explained, e.g. if I have a single non-compound statement after ‘if’, does that constitute a unique scope?)
  • There is no ‘for’, ‘foreach’, or ‘do while’, only ‘while’. I personally hardly ever find a use for ‘do while’, and though ‘foreach’ is particularly useful, it is just a convenience. Not only do these constructs represent more syntax to explain, they represent distracting choices learners will have to deal with when writing code. Not including them in the language hammers home the idea that only ‘if’, ‘else’, ‘elseif’, ‘while’, ‘break’, and ‘continue’ are needed to construct any possible flow of execution (though ‘elseif’ and, I believe, ‘continue’ aren’t really needed either). For similar reasons, there are no ‘switch’, ‘goto’, or break-to-label constructs.
  • A module’s top-level of code includes, in this order, import statements, functions, classes, and then the setup section (a section of arbitrary code). This reflects the general order in which things happen when a module is loaded: first the imports are executed, then the function and class definitions are parsed and compiled (putting functions first is an arbitrary decision), then the setup code is run. Using this strict ordering is less powerful than with Python style modules in some circumstances, but it cuts off questions about odd edge cases.
  • An ‘import ‘ statement specifies the full name of a file as a relative path (in a slash-direction agnostic manner). The equivalent of Python’s plain ‘import’ is expressed by adding a ‘to’ clause after the file name, e.g. ‘import foo.pyg to bar’; the target specified in the ‘to’ clause is a module-level reference to an associative array object (like a Python dictionary), not a special module object. To achieve the same effect as Python’s ‘from import *’, use the keyword ‘here’, i.e. ‘import foo.pyg here’. (The ‘here’ clause feature will probably just be omitted, as it encourages namespace pollution, and it’s really just another typing saver.)
  • Pygeon has no concept of packages.
  • Pygeon has no print statement, as it provides nothing a function couldn’t do with the simple addition of parentheses.
  • Each file has a module-level scope, but there is no built-in scope. The most basic parts of the standard library are always implicitly imported into every module with ‘import here’, and warnings are issued by the code checker when you reassign these members of the module scope.
  • Functions and classes are always written at the top-level of code (not including constructors and methods which of course go inside their class definitions).
  • Classes in Pygeon have no concept of member access restrictions. Nor, in fact, is there a concept of inheritance or type hierarchy supported in the language—all objects are simply associative arrays that either have some member or they don’t. If you want a class that inherits from another, you’ll have to cut and paste all the parent class stuff into the child. (Obviously this is not acceptable in a real-use language, but it dodges all sorts of otherwise needed bothersome rules, and it emphasizes the true nature of polymorphism: an object can be used in the place of another type of object when it has the necessary members. Besides, it’s a classic beginner mistake to overuse inheritance.)
  • All methods of a class are declared in a ‘class’ block. First goes the optional class setup code (for making static members); second goes the (mandatory) constructor, which is written just like a function but declared with the keyword ‘constructor’ (and given no name); then come the methods, declared with the ‘method’ keyword.
    | class Cat
    |   foo = 3 // static member foo
    |   constructor // an empty constructor
    |   method meow
    |     // do meow stuff
  • Instance members are created via assignment of members to the ‘this’ variable inside the constructor and methods. The instance must always be explicitly specified with the keyword ‘this’, so there is no name collision between local variables of a method and the instance. A method can call another only via this. The class object itself can be accessed in a method by name, and this is how static members are accessed.
  • Pygeon includes exceptions, but there is no Exception type from which all exceptions must derive. The exception mechanism throws an object in which the first element (usually a string) is what the catch blocks look for, e.g. ‘throw ['hi', moreInfo]‘; all other elements of the thrown object are optional and can be used for more information about the error. A catch block header specifies the object to match to (Pygeon first tries an identity match and then tries an equivalence test). The thrown exception object is accessed in a catch block as the keyword ‘ex’ just like it were a variable local to the catch block (obviously ‘ex’ has no meaning outside a catch block). The exception can be repropagated simply with ‘throw ex’. Pygeon has no analog of ‘finally’.
    | try
    |   //...
    |   catch 'hi'  // catches exceptions thrown with 'hi'
    |     print(ex[1]) // print the second element of the exception object
  • When you use a variable in a class, function, constructor, or method, it is taken to be in module scope unless that scope somewhere has an assignment to the same name. The only way to assign to the module scope from within a local scope is via the keyword ‘mod’, a special reference to the current module, e.g. ‘mod.bla = 5′.If you create a local foo, you can still access module-level foo as ‘mod.foo’.
  • After the function and class definitions in the module, you can an arbitrary sequence of statements which is executed when the module is first loaded. Statements in this section execute in the module scope, so they reference the functions, classes, and imported modules of the class directly, and when assigning to a variable here, it is a variable in the module scope.
  • Associative arrays (dictionaries) and indexed arrays (lists) are combined into one kind of object, using [] for the literal syntax, e.g.:
    ['hi', foo:42, 'yo'] // index 0 is 'hi', index 1 is 'yo', foo is 42
  • Unlike in PHP, there’s no weird concept of order given to the non-numeric members of an array, and the indexed members are always accessed via their originally assigned index. Members are accessed with the dot operator, e.g. ‘foo.bla’ (at bla of foo) or ‘foo.3′ (at fourth index of foo); where the index or name of the member is not known at write-time, use a string or integer expression in [], e.g. ‘foo[x]‘ (where x evaluates into a string or integer).
  • Instances are simply arrays, so unlike in Python, Pygeon objects don’t have ‘internal dictionaries’ because they are their own dictionaries.
  • As there is no inheritance, there is no hierarchy of built-in types. The built-in types are: associative/indexed arrays; strings; booleans; and numbers (the language masks the integer/float distinction in the manner of Javascript but with no size constraint). The built-in types do not have any methods because that introduces conceptual difficulties that are hard to explain. Standard library functions provide the missing functionality.
  • There is no casting operator, only conversion functions. Implicit conversions are kept very few.
  • Pygeon dispenses with Python’s operator overloading, as it’s an overly complicated feature. The principle built-in types still retain their most essential operators, but the operators are just built-in shortcuts to standard functions (not methods of the type). There’s no way to use the operators with your own types.

That covers the major highlights. My next post will be about why programming languages are hard to learn, followed by a look at particular languages and what prevents them from being good educational languages.

Trackbacks and Pingbacks

  1. BrianWill.net » Blog Archive » Syntax does/doesn’t matter - January 31, 2007

    [...] My main purpose in proposing a new educational programming language is to give learners a language that expresses the core features common to modern languages in the syntactically most direct and obvious way possible. So for instance, abbreviations that are non-obvious to a programmer must be excised from the language, including the libraries, e.g. the language can’t have anything like C’s “stdio” (”standard input/output”) or “sprintf” (”string print formatted”). (Such abbreviations would be more forgivable if full-names were given in learning materials, but out of the dozens of tutorials and references of the C language I have read, only a couple do so.) Taken individually, shortcuts like these may seem like small hurdles to the initiated, but only because the initiated can no longer see how non-obvious such shortcuts are. The small hurdles quickly add up. Every quirk, every historical legacy, every shortcut is one more thing which at some point is going to cause the learner frustration, possibly halting their progress. [...]

Leave a Reply