Typing: which is the one true faith?

26 Dec

Dynamic typing can be attributed three main virtues over static typing:

  • flexibility: A single function can vary the type of its returned value, and collections can hold heterogeneous values. Consequently, dynamic typing often lets us get away without having to think too far ahead.
  • concision: Functions, variables, and collections needn’t declare their types, and interfaces can be kept informal. (Inferred typing arguably closes this concision gap significantly.)
  • simplicity: Heterogeneous collections mean we don’t need to introduce generics.

Conversely, static typing can be attributed two main virtues over dynamic typing:

  • efficiency: When the compiler knows the variable types for certain, it can make numerous optimizations that otherwise aren’t possible.
  • correctness: The compiler can perform type checks, effectively eliminating a whole class of bugs. (However, any remaining errant null references and incorrect ‘down casts’ arguably constitute type errors, so not all static languages eliminate type errors entirely.)

The interesting question to me is, ‘when and why do programmers actually make type errors?’ In my limited experience, I’ve worked on long term projects in dynamic languages and hardly ever made any type errors. In a 10k-line Javascript project, for example, I bet I probably made fewer type errors than I can count on one hand.1 This realization puzzled me for a while because it got me wondering why the hell so many people obsessed over type errors? After all, a type error is only one kind of bug among many and, in my experience, a not terribly common kind. All the programmer has to do is consult the documentation of the functions and objects they use to avoid making type errors. Right?

Well a little more experience explained this mystery: type errors become easy to make when dealing with ‘alternate-form types’. Such types enter the picture in a few ways:

  • numeric types: int, short, double, float, decimal, complex, etc. These are all numbers and so easy to confuse.
  • strings of different encodings: Again, these are all representations of essentially the same thing and so easy to confuse.
  • strings representing non-textual data: Do we need a number or a string of numeric digits? A boolean or a string reading ‘true’ or ‘false’? A code object or a string of code? A Foo object or its string representation?
  • numbers used for enumerations: Do we need a 0 or false? Does this function expect us to indicate the color blue with ‘blue’, 3, or COLOR.BLUE?
  • wrappers and collections: Do we pass in a Foo object or a FooWrapper object? An int or an Integer? A Bar object or a collection of Bar objects? The row object returned by the ORM or the business object representing the same data?

Without alternate-form types, type errors would almost never occur, for why would anyone ever accidentally mistake an Elephant for a Motorcycle? Mistaking a Scooter for a Motorcycle, on the other hand, is not so hard to imagine.

The important takeaway of my experience is this: the fact that alternate-form types arise less commonly in higher-level domains (such as Javascript for a webpage) partly explains why dynamic typing is generally more favored in front-end coding than in back-end and ‘engine’ coding. What seemed like a non-issue to me in one domain became a constant concern when working in another.

Typing peace in our time?

Now, if front-end and back-end code always lived in neat separate boxes, having to choose between static and dynamic code wouldn’t be a hard choice: we could use dynamic languages for front-ends and static code for back-ends. Most projects, though, straddle a line in both worlds, for a front-end, of course, must ultimately call into a back-end. Often this is done over the network such that the dynamic/static divide doesn’t really matter, but in other cases, we want to invoke a back-end as a library or framework, requiring some bridge. For example, Python can use modules written in C, but only with some significant adaptation.

Could we solve this problem? Could a single language accommodate both static and dynamic code that interoperate without hassle? I believe such a language is possible, and all it would require is for the programmer to declare each function/method as either static or dynamic:

  • In a dynamic function, type declarations would be optional, such that you might leave some or all types undeclared.
  • In a static function, all types must be declared (except those which can be inferred).
  • A dynamic function can invoke static functions with no hassle, though the runtime of course would have to perform a type check on the inputs. In some cases, type errors could be caught by the compiler in the context of a dynamic function, e.g. the return type of an invoked static function is known, so we might detect the improper use of the return value as argument to another invocation of a static function. On the whole, though, no guarantee is made about the type correctness of code in a dynamic function.
  • A static function can invoke dynamic functions but must declare the expected return type (where it cannot be inferred by context) so that the type can be checked at runtime. Of course, while the runtime check preserves the type correctness of the rest of the code, invoking a dynamic function effectively introduces potential for a type error within the context of the static function, in a sense nullifying the type assurance we’re aiming for with static typing (though of course null references and down-casts, if present in the language, already undermine the type safety of our static code.)
  • Homogenous collections and generic types would retain their typing in dynamic code, e.g. an ArrayList<Foo> would throw an error if you attempt to append a non-Foo object.

The other necessary measure for mixing static and dynamic code is introducing a distinction between static and dynamic classes, for it wouldn’t do for static code to access properties that might get deleted or change their type. Dynamic code would interoperate with static types freely, but static code would have to assert types to use dynamic types.

So the general pattern would be that using static functions and types in a dynamic context would be pain free, but using dynamic functions and types in a static context would require a bit of bother. This is likely not really a painful cost at all, as invoking static code from dynamic code is the much more useful case, for generally front-ends call into back-ends rather than the other way around. Programmers would start out writing a project in dynamic code but then gradually evolve their codebase, in whole or in part, into static code.

It can’t be that easy

Surely if the solution were so simple, someone would have done this by now, right? Well, not necessarily. One explanation is that static vs. dynamic is one of those religious debates in programming, and language designers are certainly opinionated, for why else would they create a language? Static typing purists create static languages because they believe efficiency and type safety shouldn’t be compromised, and dynamic typing purists create dynamic languages because they believe static type systems are overly troublesome and complicated. On the static side, especially, most energy seems to go into fixing type system problems by doing static typing ‘the right way’ (see Scala and Haskell).

But the main reason no one has integrated dynamic and static typing into one language is that, until fairly recently, no one could see the point. Until the rise of Javascript, Python, et. al., dynamic languages were used almost exclusively for small codebases of high-level code, meaning the alternate-format types problem never became pronounced. Now, however, we’re pushing dynamic languages into domains of larger codebases and infrastructure code, which suggests the need for a dynamic/static hybrid.

I’m certainly not he only person to get this idea. A future version of ECMAScript, for example, may feature optional static typing (though this is very much up in the air). Also, the academic project StaDyn approaches this problem from the opposite direction as my solution, treating static code as the default with optional dynamicism. I haven’t looked closely enough at this to form an opinion, but it looks interesting.

  1. This excludes errors from mistyped property names, which is an error I made multiple times each day. I don’t consider these to be type errors, though, but rather ‘name errors’, and as I’ll discuss an another post, static typing isn’t needed to eliminate name errors. []

No comments yet

Leave a Reply