PygeonJava: is it worth bothering?

February 21, 2007 – 4:39 am

The need for an educational alternative syntax for Java is less compelling than for C, assuming, at least, that Java is not the first language learners are being exposed to. Not to say that Java is particularly easy—it’s gotten rather baroque rules, in particular, surrounding inheritance, access, and generics—but it’s not the syntax of these features that is really the problem.

In fact, most of the real danger areas of Java syntax are where the similarities with C’s syntax are misleading. In my own experience of learning C before learning Java, this is the case with Java’s array syntax (and it certainly didn’t help that I wasn’t really straight about C’s array and pointer syntax). Conceptually, Java’s arrays are conceptually quite different from C’s arrays, and the fact that they aren’t allocated on the stack has a number of important consequences, such as allowing array size to be specified dynamically.

Here’s my current idea for array reference syntax:

foo:a-char // declare reference to array of chars

foo:a-char (3-char) // same as above, but initialize to an actual array (note the size has to be supplied)

foo:a-char (bar-char) // size of array is given by expression (the variable bar)

foo:a-char ((add bar 3)-char) // size of array is given by expression (sum of bar and 3)

foo:a-char (a-char ‘a’ ‘b’ ‘c’) // actual values are supplied and so size of array is inferred from number of elements

foo:aa-int // two-dimensional array of ints

foo:aa-int (4-a-int) // create two-dimensional array of ints, specifying the first dimension

foo:aa-int (aa-int (a-int 3 5) null (4-int)) // leave first dimension size inferred from number of elements

foo:aaa-Dog // declare reference to 3-dimensional array of Dogs

In the end, Java arrays are still a rather complicated matter and so real understanding is probably only gained by a thorough discussion of what’s really going on in memory.

A more general conceptual problem in Java is its bifurcated type system. Learners coming from Pygeon and C will already have been introduced to reference variables and value variables, but not in the same language. A possible pedantic step is to introduce an arbitrarily different syntax for declarations of value and reference variables; I’m thinking of having reference variables use semi-colons in place of colons. (Again, it’s important to make explicit to the learner that such syntax is strictly unnecessary but for their own good.)

foo;Dog // Dog is a reference type
foo;a-int // ‘array of ints’ is a reference type
foo:int // int is a primitive type

I’m not sure if this rule is useful or really just distracting. Come to think of it, this seems like a problem that smart syntax highlighting could make clear just fine, so I’ll probably just leave it at using just colons.

The use of the dot operator is very prevalent in Java programming, and something like this is quite common:

a().b().c.d()

My concern is that this stuff gets quite ugly in the pure prefix notation I’ve been using:

(m (m ((m (a) b)) c) d) // remember a statement has an implicit set of () around it

Adding the infix dot operator makes this considerably better:

((a).b).c.d

In fact, I’m reconsidering leaving the dot operator out of Pygeon and PygeonC.

A serious point of confusion I experienced myself was in distinguishing between the use of dots in specifying packages and the use of dots in accessing members; it takes a while to pick up on how to parse something like:

bird.dog.Cat.lizard

When versed in Java, you know by convention that Cat is a class or interface because it’s capitalized, so you know lizard is a static member of the class (or interface) Cat that resides in the package bird.dog. This syntax and convention is clever in its reuse of symbols, but it is far too newbie-hostile. For clarity, I’m thinking a different symbol should be used to mean ‘thing in package’, and I’m learning towards /, e.g.:

bird/dog/Cat.lizard

Still, this isn’t entirely satisfactory because it still implies that the package dog is ‘inside’ the package bird, which is true from the directory structure standpoint, but it doesn’t mean what I think it often implies to learners—that dog has some language-enforced relation to bird that’s something like the relation between child and parent classes. Perhaps dissuading students of this idea just requires repetition. It’s also possible that this is just another case where smart syntax highlighting (such as highlighting package names and Class names differently) could make this stuff clearer.

Beyond concerns about just the look of package names, the import statement simply presents too many choices for learners to bother thinking about, e.g. what happens if you import everything from two packages which have name collisions? Importing also raises questions about name collision with the current file. The radical solution is to forgo importing altogether and simply require all class references be fully qualified. Less radical would be to disallow importing a whole package at a time with *. My only concern, then, is confusion from the inconsistency with Pygeon, where modules are imported as objects and the contents accessed via that reference as needed. (BTW, note that if standard Java were to use something other than . with packages, there wouldn’t be problems of ambiguity when there’s a name collisions between a local name and a foreign package name.)

My hope is that Pygeon’s simple exception mechanism will make Java exceptions easier to learn. In Pygeon, there’s never more than one catch block to each try (testing the exception type, if desired, is just done programatically inside the one catch block), but Java’s try can have multiple catch blocks and also have a finally clause.

Java’s checked/unchecked distinction really confuses things, as it really seems like a superfluous feature that just incurs extra typing; it’s a good example of where failing to convey the intent behind the language feature makes the feature actually harder to learn and remember (though many programmers will agree the feature actually doesn’t have any good justification).

Instead of ‘throws’ I prefer instead ‘maythrow’: ‘throws’ is confusingly too close to ‘throw’ and also falsely (and strangely) implies an exception will be thrown when it is only to indicate one may be thrown (though I suppose you could write a method that always throws an exception, but that’s a very odd thing to do).

Another small touch would be requiring constructors to be explicitly labeled as constructors, nor do I think including a constructor should be optional. This are more small bits of pedantry, but I think appropriate. There are a few possibilities for constructor headers:

constructor public Main x:int
constructor public Main x:int rt Main // verbose but entirely explicit
constructor public x:int // minimal

I’m leaning towards minimal. I either want to explain why there is both a superfluously specified name and a superfluously specified return type or I want to explain neither. (The philosophy here is to eliminate superfluous syntax except where it helps label unfamiliar things.)

Initiated Java programmers have no trouble distinguishing class fields from local variable declarations, but I think learners do, so fields will be declared with the keyword ‘field’:

field private foo:int

Speaking of fields, another good place to simplify the language would be to not allow direct initialization of fields in their declarations (though in truth, there are subtle differences between; there’s also the matter of static initialization blocks, which introduce a host of bothersome rules about the execution order of static blocks, instance blocks, constructors, and super-constructors.)

Inner classes are a rather confusing subject, partly because of poor chosen terminology: an “anonymous inner class” is really an entirely different thing from an “inner class”; they better would have been called something like “anonymous-type objects”. (Which raises a problem: how will anonymous-typed objects be written without free-form syntax?)

It’s very tempting to leave out the whole mess of generics: the subjects of automatic type promotion, autoboxing, and finding the best matching overloaded method given a set of arguments—these are all already complicated and annoying enough. On the other hand, I find it quite annoying when introductory texts leave out mention of whole language features. Maybe generics can be left for when learners study the standard language syntax.

Lastly, I recall finding the assortment of modifier keywords (’static’, ‘final’, ‘abstract’, etc.) to be rather confusing. Even before getting straight what each one does, I found it difficult to get straight which go with which language constructs and which are mutually exclusive. I don’t really have a remedy for this, but there may be cases where the reuse of these keywords in different context makes them confusing, e.g. perhaps ‘abstract’ should only be used for classes while abstract methods should use ‘unimplemented’, ‘unspecified’, or some such. Another example is the use of ‘final’: the meaning of ‘final’ used on a class is quite different from ‘final’ used on a field or local variable.

UPDATE: It occurs to me that perhaps default access should not be allowed as I can’t think of any legitimate use for it. The only place I’ve ever seen it used is in quick-and-dirty code examples. Similarly, package statements should be required so as to disallow use of the default package.

Post a Comment