What’s the matter with C?
6 Feb
First off, I’ve made a couple more decisions about Pygeon syntax:
- The assignment operator is the keyword a (standing for ‘assign’), not the symbol = . So an assignment looks like (a foo 3) // assign 3 to variable ‘foo’ . The = symbol in math is not actually an operator, and so it is one part of mathematical notation that confuses the whole issue of expressions. It’s better to sidestep the whole distinction by never associating assignment, the making of equality, with what students are used to thinking of as the declaration of equality. Besides, assignment expressions already bring with them a confusing concept: assignment is a unique operator in that the target operand has to be a so-called ‘lvalue’ (‘left value’, a term coined in C), and in this sense the target operand doesn’t get evaluated as operands normally do.
- The membership operator (the Pygeon equivalent of both . and [ ]) is the keyword m (standing for ‘member’), e.g. (m foo 3) // return the 4th index of ‘foo’ .
Continuing with the discussion of PygeonC syntax (see previous post):
The biggest pieces of C missing from Pygeon are syntaxes for variable declarations and structure definitions, and these happen to be areas where C could greatly stand to improve, for the sake of both learners and proficient users of C.
The worst decision made in the whole of C syntax is the way that the [ ] and * operators have opposite—but not precisely opposite—meanings in the context of a declaration as they do in an expression, e.g. in a declaration, [] makes something which is not an array into an array, while, in an expression, [ ] makes something which is an array into a constituent value of that array. Worse, the fact that * is overloaded to be both the binary multiplication operator and the unary * dereference operator is just confusing. To begin with, the unary/binary operator distinction is hard for most students to get used to because the only unary operators familiar to them are the – and + signs, which seem like special cases because their meaning as unary operators is logically related to their meaning as binary operators: students generally think of such signs as part of the number, not operators; this is not the case with unary * in C, which actually does something entirely differently from the binary * operator.
Quickly determining whether a * is a dereference or a multiplication is hard enough, but then * might also be indicating a pointer in a declaration, which raises another problem: learning to distinguish C’s declaration statements from expression statements is hard at first. The simple declaration statements—the ones beginning with one of the type keywords—are generally easy enough to quickly identify, but things get ugly in more realistic examples of C code, where typedefs, *, and [ ] are used frequently. Beyond just recognizing declarations, writing and parsing them is tricky: the base type can be modified with a panoply of keywords before or after the base type and the *, (), and [ ] operators using a totally unintuitive precedence. It’s in these complicated declarations that the decision to mix a prefix operator (*) with two postfix operators ( [ ] () ) is shown to be just egregious. In fact, I think it demonstrates that the whole idea of using the idiom of expressions in declarations makes absolutely no sense. I’ve read Dennis Ritchie justify this choice as an attempt at conceptual unity—complex types should be declared like how they are used—but this doesn’t really hold water: in a declaration, nothing is being evaluated. Instructors of C have long noted the trouble students have with C’s complex types, but I suspect the primary fault here lies with the strange syntax which obscures the underlying simplicity. (As much as I dislike this syntax, it severely annoys me how few C tutorials and books fail to emphasize—or point out at all, in some cases—that these are in fact operators with the base type as their operand. K&R is the only book I know that emphasizes this at all, which I think is one key reason it’s still so highly valued.)
So what does PygeonC do about declarations and the reference/dereference syntax?
- & is replaced with ref (or maybe just r), and dereference * is replaced with dref (or maybe just d). E.g. (dref foo) is equivalent to C’s *foo. Verbose, surely, but clear and consistent with the established syntax.
- There is no [ ] indexing operator as it is really just a convenience: x[3] is equivalent to *(x + 3). E.g. PygeonC would express x[3] as (dref (add x 3)). Again, this is verbose, but better to lay bare what is really going on. (On second thought, perhaps dref can optionally take a third operand to specify an offset before dereferencing, e.g. (dref x 3) .)
- Rather than having signed and unsigned modifiers, there should simply be keywords for the unsigned variants, e.g. ‘ulong’, ‘ushort, ‘uint’, etc.
- My current thinking for the declaration syntax looks like name:type . (I’m not sure about doing the reverse from C, putting the name before the type, but it just appeals to me at the moment.) Allocation modifiers go before the type separated by another colon, e.g. name:static:type .
- An array modifier is simply the number of elements in the array. A pointer modifier is simply the letter p. The array and pointer modifiers go before the type, separated from it by hyphens, e.g. foo:static:3-p-p-int //foo is a 3-element array of pointers to pointers to ints . (Hyphens must separate all p’s and numbers; it would be perfectly understandable to omit hyphens between adjacent p’s, but for simplicity and consistency, it is not allowed.)
- Notice that the declaration syntax makes a declaration all one token without whitespace. This means there is no separator necessary between the listed parameters in a function definition and it makes the casting syntax simpler. A cast is simply done by using the type as an operator, e.g. (p-int foo) // cast value of foo to a pointer to ints
(I haven’t even really discussed the problems with function pointer syntax; that will have to wait, as a better solution is not something I’ve cracked yet).
Here are some additional thoughts:
- The membership operator is used just with structs, not dynamic associative arrays, so its argument is always known either to exist or not exist at compile-time. Because these names are always resolved at compile-time, the member is specified not as a string but as the name directly, e.g. (m foo bar), not (m foo “bar”). Using (m foo “bar”) would imply there is some string “bar” somewhere in the program text, but this is just a name used by the compiler.
- It occurs to me that students should be made aware of an important conceptual difference difference between operators and functions in PygeonC that doesn’t exist in Pygeon. In C, some operators take a range of types for their operand(s) rather than just one and only one type; this is aside from the implicit casting of the core types, e.g. there is no implicit casting between integers and pointers, yet the + operator works with both integer and pointer operands. In other words, operators can be made to work with a range of types that you just can’t do with functions. Just as important, the type returned by an operator may vary whereas the type returned from a function never can.

No comments yet