Poorly explained aspects of Java explained not so poorly (part 1)

October 20, 2007 – 4:11 pm

Most Java instruction materials fail to make certain basic things as clear as they could be, so here’s a FAQ-like rundown.

What are the types of values in Java?

Java divides its types into what it calls ‘primitive’ and ‘reference’ types (this terminology is unique to Java):

  • The primitive value types consist of five integer types of different sizes (int, long, char, short, byte), the floating-point types of different sizes (float and double), and the boolean type.
  • A reference value is an instance of a class.

(A reference value might also be an instance of an enum—a class-like enumeration. I won’t discuss enums as they were added late to the language and most programmers get by without using them.)

Confusingly, the terms value type and value variable are sometimes used as a synonym for primitive type and primitive-type variable, respectively. Less surprisingly, reference variable is used to mean a reference-type variable.

What’s a literal?

Values of some types can be expressed as ‘literals’, i.e. literal representations of particular values:

  • 35 : a literal int value (all integer literals are of type int by default)
  • -12.51 : a literal double value (all floating-point literals are of type double by default)
  • ‘b’ : a literal of type char (the ASCII value of ‘b’ is 98, so if used in an arithmetic expression, it will act as that integer value, e.g. (’b’ + 2) is 100)
  • true : the reserved words true and false are literals of the two boolean values
  • “Aye carumba!” : a String literal

Notice that the only non-primitive type of literal is a string literal (a string is an object, an instance of the class String in the package java.lang).

What’s an expression?

An expression is one of two things:

  • a value
  • an operation

A value is either a literal or a variable. A literal obviously evaluates into the value which it represents, while a variable expression evaluates into the value it holds at the time it is evaluated.

An operation consists of an operator and operands and evaluates into a value. For instance:

3 + 2      // the operation + has operands 3 and 2 and evaluates into the value 5

Note that the operands are themselves expressions. In this case, the operands are values, but they could be any kind of expression as long as those expressions evaluate into the right type of value, e.g.:

3 + (9 - 2)     // the first operand of + is 3 and the second operand is the expression (9 - 2)

Also note that, in all cases, an operation evaluates down into a value—we can say that the operation ‘returns‘ a value—and that value has some particular type. In Java, it’s an important feature of the language that the type of value returned by an operator expression is always known from the operator and the types of its operands, e.g.:

(’b’ + 3)      // the + operator with a char operand and an int operand will return an int value

The language is designed such that the compiler always knows the type of each expression, i.e. the type of each value and the type returned by each operation. For the language to know this, you must always declare the type of each variable, the type of each parameter, and the return type of each function. This is the essence of what it means for Java to be a statically-typed language.

Each operator has its own rules about how many operands it takes, their types, and the type of the value it returns. Some operators change what type of value they return depending upon the number and type of their operands. For instance, the + operator will return a String rather than an int when used with a String operand:

“Johnny” + 5    // a + operation with a String and int operand will convert the int into a string and return a concatenation of the two strings as a string: “Johnny5″

There are a few dozen operators in Java:

  • a few are unary (taking one operand, e.g. the ! operator)
  • most are binary (taking two operands)
  • one rarely used operator is ternary (taking three operands), the ? : operator

Parentheses and the rules of precedence are used to determine which operations are the operands to which other operations:

3 + 7 * 9    // * has higher precedence than +, so (7 * 9) is evaluated first and is an operand to the + operation
(3 + 7) * 9     // the parentheses override the usual precedence, so (3 + 7) is evaluated first and is an operand to the * operation

(Note that the whole point of operator precedence is so lazy mathematicians don’t have to put each individual operation in its own set of parentheses.)

The operators and their precedences are listed here.

Not all operations are denoted by operators, however, for a method call can be thought of as an operation: as determined in its definition, a method takes some number of operands of particular types and returns a value of some particular type. (Some think of the parentheses of a method call as an operator such that the name of the method and the arguments are the operands, but I prefer thinking of the method name itself as the operator and just the arguments as the operands.)

What’s an expression statement?

An expression statement simply has this form:

expression;

In Java, an expression statement must be either an assignment operation or a method call, e.g.:

x = 3 + foo;     // an assignment statement
cow.moo();    // a call statement

In some other languages with similar syntax, such as C, many compilers won’t complain if you have an expression statement like foo; (which says, ‘evaluate the value of variable foo and do nothing with it’) even though such statements are pointless. Java will complain if you write such do-nothing expression statements.

(Of course, it’s quite possible a method call expression might not do anything useful, but the Java compiler can’t know that; it only objects to expression statements it knows couldn’t possibly be useful, such as 3 + 5;)

What’s a declaration statement?

A declaration statement has the form:

type name;

For instance:

int x;     // declare a variable named ‘x’ of type int

Cat c;     // declare a variable name ‘c’ of type

For convenience, you can declare multiple variables of one type in one declaration statement using commas to separate the names, e.g.:

int x, y, z;  // declare three ints: x, y, and z

Also for convenience, you can assign a value to a variable as you declare it using this form:

type name = expression;

…where the expression evaluates into the value assigned to the variable.

In a multiple declaration, assigning values to the variables looks like this:

int x = 3, y, z = 2;

…which is no different from writing the same thing as three successive statements:

int x = 3;
int y;
int z = 2;

What’s a control statement?

A control statement is one of the statements having to do with flow of execution: if, while, for, break, continue, try-catch, return, and a few others. The rules and meaning of these statements are particular to each kind.

What’s the difference between a value variable and a reference variable?

Variables of primitive types are value variables, meaning they directly hold a primitive value:

int x = 3;
int y = x;
x = 4;  // though y got its value from x, modifying x afterwards has no effect on y
 System.out.print(y); // print 3

Here, x and y represent two locations in memory where an integer value is directly held. After the second assignment, the memory locations of x and y each hold separate copies of the value 3.

Variables of reference types are reference variables, meaning they hold a reference (address) to an object, not the object itself:

Cat x = new Cat(); // a new Cat is created, and memory location x now holds the address of that object
x.name = “Fluffy”;
 Cat y = x;   // the expression (x) evaluates into the object referenced by x, and so the new Cat reference named y is assigned the address of the very same object referenced by x
 x.name = “Mittens”;    // modify the property of the cat object referenced by x
 System.out.print(y.name); // prints “Mittens” because y referenced the very same object as x when we modified the object’s ‘name’ property via the reference x

Here, x and y represent two locations in memory where addresses are held. The actual Cat object is elsewhere in memory. Both x and y are assigned the same Cat object address, so modifying the object referenced by x is the same thing as modifying the object held in y—they’re the same Cat.

What are the primitive types?

The integer primitive types are:

  • byte
  • char
  • short
  • int
  • long

The floating-point primivite types are:

  • float
  • double

For all these types, see here for their exact sizes. If your code requires high-precision floating-point calculations, you should read up on the strictfp modifier.

Finally, there’s the boolean primitive type, which consists of two special values, true or false. In other languages, such as C, numbers are used to mean true or false in special contexts—usually zero represents false while all other values represent true—so why have a unique type for true and false? Well the thinking is that this protects you against accidentally using a number when you meant to use a true/false value and vice versa and helps clarify the intent of code when it is read.

Is = really an operator?

In Java, yes. However, the = operator may seem like something of a special case, and that’s because it is: the = operator’s left operand is not a value (and hence not an expression) but rather a target, i.e. a variable to assign to. Consider:

foo = 2;

It wouldn’t make sense for foo to evaluate into its value here because you can’t assign a new value to a value—that just doesn’t make any sense. Rather, we are assigning a value to the variable itself.

Despite this unique difference, = is still an operation and does return a value, which is the value assigned to the target:

3 + (x = 4)       // first 4 is assigned to x, then 3 is added to 4

Because = operations are right-to-left associative, we can chain assignments:

x = y = z = 5;

This is equivalent to:

x = (y = (z = 5));

A common typo is to type = when you meant to type ==. In C, this creates a problem in such cases as:

if (x = 3)  { … }

…because, in C, an integer can be used as a true/false value, so the value 3 returned by (x = 3) is accepted as the condition value even though the programmer certainly meant to use == instead. In Java, in contrast, a condition must be a boolean value, so the compiler will complain here about an invalid if condition.

Some other languages think it’s a bad idea to allow assignments to occur in unexpected places, so they make = useable only as the outer operation of an expression statement. In general, you should follow this convention, as code is hard to read when assignment operations occur in unexpected places. E.g., instead of:

 foo(x = 3);

…do this:

x = 3;
foo(x);

What’s the order of the modifiers?

Java contains some keywords which modify reference-type declarations (classes, interfaces, and enums), local variables, fields, and methods. Some of these modifiers can’t be used with other modifiers, but Java doesn’t care about the order in which you write the modifiers as long as they all go before the thing they modify, like so:

  • class: modifiers class Name {}
  • variable (local or field): modifiers type name;
  • method: modifiers return_type name(parameters) { }

So, for instance, you could write a field as :

static final int x = 3;

…or:

final static int x = 3;

…but not:

final int static x = 3;  // modifier ’static’ must go before the type (’int’)

Why is the syntax for creating objects so verbose? E.g. Foo foo = new Foo();

The reason it’s so verbose is because really two independent things are going on in such a statement. If we have a class named Foo, then this:

Foo foo;

…declares a reference of type Foo. No object has yet been assigned to foo, so it holds the default value of null, the special value representing a reference to nothing. To actually create a Foo object, we use the new operator followed by a call to the constructor:

new Foo()

Understand that, like all other calls to functions, this is an expression! It just looks funny because ‘new’ (which is a unary operator just like ‘!’) is always separated by whitespace from its operand (the call to the Foo constructor). The new operator must be used when calling a constructor to create a new object. In the most common use of new, the newly created object is immediately assigned to a reference, like so:

Foo foo = new Foo();

…but a new operator expression is really an expression like any other, so you can create a new object of a type anywhere an object of that type can be used without having to assign it to a reference:

funky(new Foo());  // the method funky taking a new Foo object as its argument
 new Foo().bla();  // create a new Foo object and then call its bla() method

In this last example, the new Foo object is not assigned to a reference and so lost after the statement. This is not the most common thing to do, but it is not all that rare in real code.

An initially confusing thing about the syntax of new is that it has a higher precedence than the dot operator, so the last statement of the last example could equivalently be written:

(new Foo()).bla();

For clarity, it might help to always imagine parentheses around every use of new and its constructor call operand like above.

What’s the deal with calling one constructor from another?

Within a constructor, we might call another constructor of the same class—or even call the current constructor recursively—to continue the job of constructing the object. To do this, you wouldn’t prefix the call with new. Conceivably—if rarely—you might wish to create a new separate instance of the same class inside a constructor (e.g. inside a Cat constructor, you might wish to instantiate a new Cat that is independent of the Cat being constructed), in which case you would use new.

Is new necessary?

Arguably, if you got rid of new, the compiler could simply infer the creation of a new instance from the fact that you are calling a constructor, e.g.:

Cat c = Cat();    // not proper Java (assuming Cat() is a constructor call)

However, (as discussed in the previous section) the new operator helps distinguish between calling one constructor to another and creating a separate object of the same type. For instance, inside the Cat constructor, new Cat() would create a new separate Cat while Cat() would either invoke another Cat constructor or the current Cat constructor (recursively) for the purpose of constructing the current object, not a new separate Cat.

Arguably a better solution could have been used to make this distinction, allowing us to avoid typing new so much, but in any case, I find having new is nice because it makes object instantiations stand out in code.

What is “the stack”?

When loaded, your program is alloted a contiguous piece of memory called ‘the stack’, where all of its local variables are stored. It works like this:

  • The stack starts out empty.
  • As a function executes, its local variables are pushed (stored) on the top of the stack.
  • When the currently executing function call returns, the local variables it created are popped (removed) from the stack before execution returns to the function which called it.

The set of locals created by a function call is called a stack frame.

Notice that, AT ALL TIMES, the frame at the top of the stack belongs to the currently executing function, and the frame below it belongs to the function which called the currently executing function, and the second frame down belongs to the function which called the function which called the currently executing function…and so on, until you get all the way to the bottom frame, which belongs to the main() function of your program. When main() returns, the program ends and the stack is empty.

Such stack-based execution is the dominant model of execution in programming.

If your program’s stack outgrows its space in memory, the Java runtime will request more memory from the operating system if it needs more; if the OS refuses the request (which happens when there isn’t enough memory to be had), the Java runtime causes an exception to be thrown in your program, crashing your program if you don’t catch the exception. (This is a good example of an exception you shouldn’t catch, as there’s generally nothing you can do to recover from running out of stack space; at most, you should catch it, try to do some clean-up work and preserve data you don’t want to lose, then terminate the program.) This kind of error is called a stack overflow because your stack needed to ‘overflow’ its space to continue execution. On a modern desktop system, the amount of memory available makes it hard to imagine a well-designed program that needs more stack space than it can have, so an overflow should generally be seen as a bug or design flaw in your code, not a vexing system limitation. By far, the most common cause of stack overflows is accidentally allowing a function to make too many recursive calls.

(Programs may actually use more than one stack: as I’ll discuss later, each ‘thread’ of a program has its own stack. However, all programs start with one thread, and many programs never need more than one, so we can ignore multi-threading for now.)

What is “the heap”?

Aside from the stack, the ‘heap’ is the other part of memory used by your program. Remember that local variables in Java consist of only primitives and references to objects, not any objects themselves, so you won’t find any objects on the stack; rather, all objects are stored in the heap.

No matter how many threads you have, there is always only one heap for your program.

While the stack is a strictly organized piece of memory that preserves the local variables of each function call, in contrast, the heap, as its name suggests, is a disorderly place. Much intelligence goes into keeping track of which areas of the heap are free, deciding which spots to place new objects in, and avoiding wasteful gaps between objects, but you don’t worry about all that, for it is the job of the Java runtime to manage all of it.

Like with the stack, the Java runtime will request more memory from the operating system if it needs more; if the OS refuses the request (most likely because there isn’t enough more memory to be had), an exception is raised.

Classes are objects too!

When you start a Java program, each class that is used in the course of the program is loaded as an object of the special class java.lang.Class. Among other things, this object is where the static fields of a class are stored (in case you were wondering.)

Yes, it is confusing to have a class named Class. For one thing, if all classes are represented by a Class object, what about Class itself? Is there an instance of Class representing the Class class? Which came first: the Class class or the Class instance? There can’t simply be code that says new Class() because the Class class would need to be loaded as a Class instance first before the Class constructor could be used. (In fact, Class has no constructors.) The answer is that Class requires special treatment by the Java runtime at load time.

You can get the Class object of a class using the static forName method of Class:

Class stringClass = Class.forName(”java.lang.String”); // get a Class object representing the String class

There’s not much you’d normally want to do with Class objects, but they make some meta-programming techniques possible that otherwise wouldn’t be.

What’s a local variable?

Any variable you declare inside a function—including the parameter variables—are local variables. A local variable is created when its declaration statement is reached in the execution of a function. The local variables of a function call are discarded when the call returns: they don’t actually get erased on the spot, but the memory they occupy on the stack becomes free game to be overwritten by the creation of local variables in later function calls, so they’re as good as dead.

Actually, if a local variable is declared within a control block, such as of an if, for, try, catch, etc., then it is said to be local to that block rather than local to the whole function, and it will actually be discarded from the stack when the block is finished, not when the whole function call is finished. Therefore, you can use a variable only in the block in which it is declared or in sub-blocks thereof.

If you declare a local with the same name as a variable of an outer scope, then the name in that inner scope refers to the local:

int x = 3;

if (true) {    // an if with the condition true is, of course, a silly thing to have, but it serves our demonstration
    int x;
    x =  5;
    if (true) {
        int x;
        x = 7;
        System.out.println(x); // print the value 7
    }
    System.out.println(x); // print the value 5
}
System.out.println(x); // print the value 3

What is overloading good for?

Method overloading (not to be confused with method overriding) occurs when a class is given multiple methods of the same name; this is allowed as long as the methods do not share the same number and/or type of parameters. When you call the method of that name, the compiler can tell which method you mean to call based upon the number and type of the arguments.

So just like some operators vary their effect and type of returned value based upon the type and number of their operands, an overloaded method can vary its effect and type of returned value based upon the type and number of its operands (the parameters).

Understand that overloaded methods are really entirely independent methods that just happen to share a name. The reason Java allows overloading is because it is often simply desirable to have a method which is callable in different ways without having to come up with distinct names for each variation. In general, a set of methods in a class that share the same name should all perform something like the same purpose as each other; if not, you are probably just confusing the users of your class. (Just like having the + operator used for both addition and String concatenation is confusing for learners of Java.)

What’s the difference between “overriding” and “overloading”?

If your class inherits a method foo but you write a method of the same name and same number and types of parameters, then this is considered overriding. If you add your own variants of foo with the same name but different numbers and/or types of parameters, that is overloading because the new variants don’t replace the inherited foo.

Improved improved Firefox tabbing

October 9, 2007 – 9:01 am

I’ve rethought my ideas for better tabbing in Firefox (first mentioned here) and have revised my mockup in Javascript accordingly. The performance of the tab switching and previewing is much better, previews are now seen as the full size of the page, and the ‘all tabs’ pulldown now causes the page to scrunch over, better preserving the look of pages in the previews and thereby making them more instantly recognizable. CAVEATS: Only tested in Firefox 2.

The Gettysburg 5-paragraph Essay

August 3, 2007 – 8:59 pm

Lincoln retouchedThere were only five known drafts of the Gettysburg Address until the recently discovered power-point version. Here now is the long lost 5-paragraph essay version:

The United States is an equality nation formed eighty-seven years ago. WOOSH! BANG! KAPOW! Now we’re fighting over it on this battlefield. I think we should make some part of this field a memorial graveyard.

First, it is a good idea to do this.

Second, we can’t do this. Because we suck compared to the guys who died here.

Lastly, no one cares what we say, but we should do this anyway.

In conclusion, I think we should make some part of this field a memorial graveyard, because otherwise the guys who died here died for nothing. If we just keep at it, we’ll get to keep Iraq Pennsylvania.

Warts on a snake: Ugly bits in Python syntax

July 27, 2007 – 9:50 pm

Opera python I point out this post not to comment on its subject but just as an example of Python code and to remark that the otherwise pretty and compact Python syntax is blighted by a few things:

  1. The ’self’ as the first parameter of every class method is cluttery and is only necessary as the artifact of Python’s conceptual distinction between bound and unbound methods. I think Javascript got this right: when a function bar is invoked as foo.bar, foo gets passed to the function as the keyword ‘this’ (though I would choose the keyword ‘me’ instead for brevity).
  2. Another thing Javascript got right is not requiring quote marks around identifier-like names used as keys in object literals. Where Python requires {’foo’:bar}, Javascript allows {foo:bar}. Of course, in Javascript, keys can only be strings while Python allows any kind of (immutable) object, so some syntax would be needed to distinguish between foo to mean the string ‘foo’ and foo to mean the object referenced by foo. I suggest something like @foo to mean ‘the object held by foo‘.
  3. The pseudo-special names beginning and ending in underscores are just ugly and annoying to read and type because it can be difficult to tell the difference between one underscore and two adjacent underscores.
  4. The colons after if, elif, else, etc. look fine, but they’re annoying to type and easy to forget to include. A delimiter is needed for one-liners, e.g. if foo: print bar , but should be omitted for multi-liners (and this omission should be compulsory).

Portals: window management for those who hate window management (mockups in Javascript)

July 17, 2007 – 9:54 pm

Portal

Jeff Atwood discusses the way Mac OS X windows don’t really have ‘maximize’ buttons, and he comes to the right conclusion: better to have overly large windows than to make users futz with the dimensions of their windows. He says:

Apple’s method of forcing users to deal with more windows by preventing maximization is not good user interface design. It is fundamentally and deeply flawed. Users don’t want to deal with the mental overhead of juggling multiple windows, and I can’t blame them: neither do I. Designers should be coming up with alternative user interfaces that minimize windowing, instead of forcing enforcing arbitrary window size limits on the user for their own good.

As it happens, minimizing the hassle of windows—both main application menus and pop-up dialogues—is the major design goal of my desktop UI design, which I’m calling ‘Portals’. Back in this post in March, I promised to present the Portals design, but I never quite finished the mockup demos in Javascript. Still, there’s enough there to convey the biggest ideas. Eventually I’ll fill in the notes and the rest of these demos and perhaps also finish the screencast about Portals which I started.

The mockups come with lots of (rambling) notes, but one thing they oddly fail to make clear is that Portals has no desktop, i.e. no flat empty surface on which to dump icons and files.

Better tabbing in Firefox (mockup in Javascript)

July 16, 2007 – 4:34 am

In response to a challenge by Aza Raskin to come up with a better way of tabbing in Firefox—in particular a solution that scales better the more tabs you have—I produced this mockup in Javascript. Be clear that, because of the way the tab previews are done, the performance is creaky and not representative of what a proper implementation would be like. Please, use your imagination and pretend the previews pop-up instantly. Also understand, it ONLY WORKS IN FIREFOX. (While it won’t work at all in IE, it should mostly work in other non-Firefox browsers, though I haven’t tested any).

The rationalization is given on the page, so I’ll simply discuss here why I rejected some ideas proposed by others and also consider some variants on my design which might be even better.

In the comments of Aza’s post, a number of people expressed a desire to introduce alternative ways of conceptually ordering the tabs other than the default order in which you opened them, e.g. some wished to be able to group their tabs (which you can kind of do already by reordering), some wished to see their tabs listed by chronology of the pages (as opposed to of the tabs), and some wished to see their tabs in a web displaying heritage (which page was opened from which other page). While there might be something to these ideas, I avoided them as there seemed to be easier gains to be made that didn’t involve conceptual changes for the user. I wanted to improve the tactile experience of dealing with many tabs.

Others have proposed some kind of zooming UI. Again, there may be something to this idea, but until hardware support can be implemented consistently across platforms, I don’t see this happening. Besides, it’s a tricky thing to get right, as many users are easily disoriented.

Others mention multiple tab rows. This idea is problematic for the same reasons displaying any one-dimensional information in rows is problematic: things move around in unexpected ways when the bounds gets resized and tabs are added and removed, messing with the users spatial memory of where their tabs are. (Of course, text has the same problem, but text has paragraph breaks that help a large section of text mostly retain some recognizable shape as it is edited and its bounds resized.)

As for variants on my design:

A major flaw of the current Firefox tabbing which my design doesn’t really conquer is that I find myself often confronted with having to do multiple searches to find a tab: by reflex, I first search through those tabs I can see, then I’ll mousewheel back and forth, then occasionally I’ll go into the full list on the right if I still haven’t found the tab. The problem here is that the worst-case scenario searches are very expensive and distracting, but they wouldn’t be if I simply went to the full list to begin with. As nearly good would be if I couldn’t scroll the tab bar at all, forcing me to go into the full list of tabs sooner; in this scenario, we’re actually better off keeping the number of tabs visible in the main bar rather low, say 7-9 at most.

If we embrace the idea that the full tab list should be used more often, it then makes sense to better purpose the main bar. Rather than show tabs which occur consecutively in the full tab list, the main tab bar could display the last viewed tabs in the order they were viewed. In this design, you actually wouldn’t ever see the current tab in the main tab bar as you don’t need to click on it, but you’d still find it in its proper place in the full tab list.

If this is too confusing, perhaps get rid of the main tab bar altogether and have the full tab list button sit by itself to the right of the search box.

I’ve also considered simply having a single vertical sidebar for tabs. This would be like having the full tab list always open. While you might object to the loss of screen space, I’m not sure it would be so bad, especially for wide screen users, who often have to artificially make their Firefox windows narrower for reading, anyway. The Vertigo extension already offers Firefox users vertical tabs, but it could be improved:

  • The tabs bar should not be as wide as the history and bookmarks sidebars are by default, not really for functional reasons (at least on big widescreen monitors) but rather aesthetic ones.
  • Each tab should be two lines high for easier clicking and so that titles can wrap onto two lines if needed.
  • Hovering over a tab should display a preview (as in the demo) and any part of the title cut off should appear extending out of the tab bar into the page. (In fact, I’m thinking that hovering over the vertical tab bar should make all cut off titles appear in this manner; even with the titles extending out of the sidebar, you would still be able to see where the sidebar ends, so when you mouse out of the sidebar, the titles would all go back to normal.)

Player “freedom” in multiplayer games is a design cop-out

July 15, 2007 – 5:29 pm

For the sake of cleaning random stuff off my hard drive, here’s something I saved which I posted to a forum a couple years back. It mainly concerns the Battlefield series of games, developed by DICE. In Battlefield, two teams fight each other for control of key points on the map; players spawn into the world on foot, but they can enter and exit vehicles found in the world, such as tanks and planes, and this has always been the key appeal of the series.

I’m waiting for Battlefield 2 with much anxiety because the developers keep talking about how the original was great because of its “freedom” and “rock-paper-scissors” balance.

BF1942 (Battlefield 1942, the original game in the series) was a great game for about 6 months, but then on public servers, the gameplay devolved into mindless deathmatch: initially, players were excited by the new objective-based gameplay, so interesting public matches were common, but then, when the novelty wore off, individuals just got frustrated in not being able to coordinate with the strangers on their team, so they started taking the ‘freedom’ of the game too far—meaning they just started goofing around, playing only for personal kill counts, and sabotaging their own teams. (Which is unexpected: you’d think there would be more focused play as the game aged compared to the first months, but this only applied to BF1942 clan matches, not public games.)

Furthermore, certain roles got too powerful as players got really skilled, most notably the pilots: remember how everyone fought over the planes all the time? Well, some of those guys won planes and kept winning them. Then they got really, really good at flying them to the point that every server has that one pilot who climbs to a billion feet then dive bombs everything with pin-point precision. Because everyone else never got practice in the planes (and because they never even hear the planes coming from that altitude), they’re mostly helpless.

Perhaps the basic question that needs to be answered is how to design a more strategic action war-game by upping the average stay-alive time while making it much more deadly for players to play in a Quakish manner (jumping around like a chicken with your head cut off) so that players are forced to use cover. In other words, designers need to coerce players into preserving their lives more without going for the Counter Strike (CS) nuclear-option of permanent death. The CS scheme has the fatal flaw of punishing less skilled players—punishing them not only with less play time and therefore less fun, but also with less practice time. Being dead most of the time hampers the novices’ ability to learn. (CS suffers worse from this dynamic more than other games because of its system whereby the best players have the most money and thereby get more practice time with the best weapons, which are very difficult to learn how to shoot accurately.)

My point is that, first, DICE over-rates freedom: ‘freedom’ in gameplay of this kind is neat at first, but then as the balance of the game comes into focus as enough players gain skill, the freedom destroys the coherence. Second, the rock-paper-scissors balance of deadly encounters must be put in a strong teamplay context or else the encounters with the enemy devolve into just random noise: player on side A has the overwhelming advantage one third of the time while player on side B has it another third.

Another reason to coerce players into playing as a team is because of the ‘90/10 rule’: 10% of players far out-class the other 90%. Such disparities make a game fun for no one except the highly skilled players (assuming they don’t care about being challenged), and worse, from a game maker’s perspective, not giving less skilled players a useful role to play discourages new entrants to the game. Forcing teams to really work together would dampen the distorting effect of the outliers upon the game.

To encourage teamplay, the most basic step is to keep teammates near each other. In the Battlefield series (and in fact in most other FPS games), teammates wandering off on their own is a constant problem because the temptation for each player to take their own course of action without consulting or informing their teammates is simply natural: even if players could reach agreements on what to do, the pace of most games is too fast and the communication mechanisms too cumbersome to execute coordinated actions, so few players try. Voice communication is not enough because 1) only a quarter of players tend to have mics 2) there’s a limit how many people you can effectively talk to at once 3) action games tend to move too fast for players to debate a course of action, and no one can decide who should give orders. Instead, what’s really needed is a way to effectively coordinate with players near you without inundating players with useless info spam; this alone will greatly encourage players to actually move in convoy and thereby really play as a team.

To enforce sticking together, the mechanism I concocted is to damage and eventually kill players for straying away from other players. The details of this are a bit tricky, as you must account for the fact that players might respawn far away from their team, a player’s teammates might die around him, and griefers might abuse the system. The system would work something like this:

  • Every few seconds or so, for each player, get the set of teammates within a certain radius. Those with n teammates within their radius are ‘in compliance’.
  • Depending upon the value of n and the number of players on a team, a team might have separate clusters of ‘in compliance’ players, so it’s not necessarily the case that the whole team has to travel as one. The center point of these clusters is found by averaging together the in-compliance players of each cluster, and HUD arrows direct out-of-compliance players to these clusters.
  • Each player has a compliance rating bar: out-of-compliance players start losing compliance points, and when they get to zero compliance, they start losing health.
  • To account for cases where a player spawns far away from other players, set their compliance bleed rate to something slow enough to reach any of the clusters.
  • A player might wish to move to another cluster, so the bleed rate of a player leaving compliance is slow enough for them to reach the other clusters.
  • Players don’t instantly regain their compliance points when going from out of compliance to in compliance; otherwise, players would abuse the system by hoping in and out of compliance, making the compliance radius less meaningful.
  • If a player’s cluster is dissolved because of his nearby teammates’ deaths or desertions, the player gets full compliance and a new bleed rate slow enough to reach another cluster.

Obviously this is all subject to real-world testing, but I think something like this would go a long way to making gameplay more coherent. Getting the bleed rates right would be tricky, so perhaps it would be more effective to instead give in-compliance players significant artificial advantages, such as more health points and/or more powerful weapons; making out-of-compliance players simply not competitive better avoids annoying death and griefing scenarios.

Stealing the web’s precious, bodily fluids

July 14, 2007 – 3:42 am

Raganwald argues that link-voting sites (Digg, reddit, et. al.) hurt the web by locking comment threads into proprietary databases, depriving the web of some of its vital webness. I would agree, except:

  1. I’ve always found Digg and reddit comment threads function as contests to see who makes the best joke, and this they do well enough. For serious discussion, they are almost always useless, so I turn to the source’s comments.
  2. Slashdot is a pretty strong counterexample. Whereas Digg and reddit comments are largely dominated by immature people, the Slashdot community is dominated by knowledgeable and insightful immature people. The only thing I really dislike about Slashdot threads are all the people complaining about Slashdot dupes, Slashdot story quality, the Slashdot moderation system, and Slashdot comment threads.
  3. Many links don’t point to blogs but rather to ‘heavier’ sites, like newspapers. I never read nor contribute to the comment threads of such sites: if it isn’t a WordPress blog or something near like, I ain’t going to bother with your crappy commenting registration process and interface. So in those cases, Slashdot, Digg, and reddit provide a forum that in my mind wouldn’t otherwise really exist.

Now as it happens, I’ve had an idea of late of how to create a decentralized Digg-like system using RSS feeds. I leave it as an exercise to you to imagine how this would work (hint: there are no centralized feeds; everyone sees their own personalized collation of received items; while not exactly the same thing as Digg, I believe this actually has some important advantages for the user). Oh, and I call the system Panoptikon (with a ‘k’ because the closest domain I could get was panoptikon.org)*.

* If you’re that one crazy person who actually follows my blog, you remember ‘Panopticon’ is the name I gave to another proposed idea. Yes, I’m re-purposing the name, as I believe it fits this idea better.

Lost in translation

July 12, 2007 – 6:20 pm

A learner’s guide to the terminology and concepts of software build processes.

What’s the difference between an assembler, a compiler, and an interpreter, and what’s a linker?

Tower of Babel

Assemblers

Let’s start with the clearest case. An assembler is a program which translates ‘assembly language’ code into processor instructions (a.k.a. ‘machine instructions’/'machine code’, a.k.a. ‘native instructions’/'native code’). What’s assembly language? ‘Assembly’, ‘assembler’, or ‘asm’ for short, is the generic name given to all low-level languages. Now what’s a low-level language? Well, whereas in high-level languages, each line of source code typically translates into more than one processor instruction, in an assembly language, each line directly corresponds to one single processor instruction. Assembly offers the programmer exact control: what you write is exactly what gets executed, instruction-by-instruction.

Because different processors understand different sets of instructions, the assembler language you use must be particular to the processor platform you intend to run your program on. For instance, if you are targeting a processor that uses the x86 instruction set (which includes Intel and AMD processors), then you would use an x86 assembler.

So why write assembly? On the downside, writing your code one processor instruction at a time is far more tedious than writing the functionally equivalent code in a high-level language. Moreover, assembly language can’t protect you from even the most basic errors and allows you to do dangerous things like trying to read memory that doesn’t belong to your program (something which the OS and the processor conspire to stop your program from doing by halting your program when it tries to do such things). So not only is programming assembly like using tweezers to move a hill of sand, the tweezers are slippery and sharp. Producing complex, reasonably bug-free programs entirely in assembly is very hard and generally just hasn’t been done since the late-80’s.

On the upside, the exact control provided by assembly allows for optimizations simply not possible in high-level languages. While compilers and interpreters have gotten quite smart, they very, very rarely, if ever, produce the fastest possible code, leaving room for a human to do better. Again, writing a program entirely in assembly is simply too impractical given the size of most modern programs; however, if a key portion of your code is a bottleneck, it might be beneficial to rewrite that piece of code in assembly and then invoke it from your high-level language code.

Assembly retains one other important role. Some important processor instructions will never be generated by the output of a high-level language, so it is left to assembly code to allow access to those instructions. For instance, on most processors, system calls can only be invoked using a particular instruction, but there’s nothing you can write in C code which will make the C compiler spit out that instruction—it’s simply something (consciously) missing from the semantics of the language; therefore, to make a system call in C, a piece of assembly code that uses the system call instruction is written in a way that the code, when assembled, can be invoked from your C code. For instance, when you open a file in C with the C standard library’s ‘fopen’ function, depending upon your implementation of C, that function either calls a function written in assembly or is itself written in assembly, and that assembly function contains the instruction to invoke the system call that opens a file. (A ’system call’ is a function provided by the operating system that can’t be invoked like a normal function because it exists in the operating system’s protected memory space; the OS and processor conspire to protect this memory space from direct access by ordinary programs because otherwise it would be possible for ordinary programs to bring down the whole system out of incompetence or do malevolent things like read files they aren’t supposed to be able to access. So, processors typically provide a system-call-invoking instruction which allows ordinary programs to invoke code at OS-defined specific addresses in the OS’s protected memory space. By allowing the execution of ordinary programs to enter this memory area only at specific points, the OS can prevent any funny business.)

Assemblers used to be a much bigger deal back in the DOS days when most programmers worked in assembly, but those days are gone. Today, assembly work is rarely done except by developers of operating systems and device drivers, and whereas there used to be many assemblers for Intel-compatible processors, today there are only a few real options (on the upside, they are all now free downloads):

  • MASM (Microsoft Macro Assembler)
  • GAS (GNU Assembler)
  • FASM (Flat Assembler)
  • NASM (Netwide Assembler)

Aside from these options, some C compilers feature mechanisms to embed assembly code amongst the C code. For instance, the C compiler in the GCC (GNU Compiler Collection) allows you to embed GAS assembly code using a special directive. (Understand, this and similar mechanisms in other C and C++ compilers are not official parts of either the C or C++ languages.)

Now, whereas high-level languages, such as Java, C, or C++, are typically highly standardized, the assembler languages for a particular processor may diverge significantly in syntax, e.g. while most assemblers on the x86 platform tend to follow the syntax established by Intel in its processor manuals (with the notable exception of GAS), they still have many sizable differences.

A high-level assembler is an assembler with some high-level-language-like conveniences thrown in. MASM arguably fits into this category, but the best example is certainly HLA (High Level Assembly), an assembler language originally conceived as a teaching tool.

Compilers

A compiler is a program which translates high-level language code—called the source—into some other form (usually processor instructions)—called the target. Whereas assemblers do basically a verbatim, one-to-one translation—like a translation from English to Pig-Latin—compilers typically have a considerably more sophisticated task—more like a translation from English to Latin. So whereas the whole point of assembly generally is that the programmer controls the exact sequence of instructions, compilers only guarantee that the code they spit out is functionally equivalent to the semantics expressed in the source. Moreover, compilers generally attempt to optimize the code they produce, making the end result correspond even less directly to the source.

Just as assemblers are particular to the precise assembly syntax they can translate, compilers are specific to the high-level language(s) they can translate, i.e. a compiler for the C language can translate C code but not Pascal code. Also like assemblers, compilers are particular to the processor platform(s) which they can target (except some compilers don’t spit out processor instructions at all but rather some kind of ‘intermediate code’, as I’ll discuss later).

Consider the case of the C language. Like with assembly, there used to be a wide variety of C compilers used back in the 80’s and 90’s, but today the market has sorted out, and there are only a few notable C compilers. The two most important are:

  • GCC (GNU Compiler Collection): Originally called the GNU C Compiler, GCC now supports many languages other than C and C++. GCC can target dozens of processor platforms, including all the most popular ones.
  • Microsoft Visual C++: Despite the name, Visual C++ supports C as well as C++. Visual C++ only targets the Intel-compatible platforms: x86, x64, and Itanium. (Technically, ‘Visual C++’ is actually the name of Microsoft’s IDE (Integrated Developer Environment), but there isn’t a more commonly used name for Microsoft’s C or C++ compilers.)

Linkers

The source code of all but the smallest programs is written spread across multiple files, and in most languages, these files are treated as separate ‘compilation units’, i.e. they are compiled independently of each other. When a compiler produces processor instructions, the resulting code is called ‘object code’, and the resulting files are called ‘object files’. While some operating systems, including Unix systems, will allow an object file to be run as a program (i.e. it will happily load the file and begin execution of its instructions), this is of limited use because, to make a complete program, the object files need to be ‘linked’ together:

In a program, the code in one source file makes a reference to code in other files and/or is referenced by code in other files: a program is a web of source files which make external references to each other, and so the source files depend upon each other. (If a source file does not reference other files and itself does not get referenced by other files, then it can’t have any effect on or be affected by the rest of the code, so it can’t be said to be a part of the same program.) Still, each source file is compiled separately, meaning that, when processing one source file, the compiler has no knowledge of the files referenced by the source code; consequently, when the compiler encounters an external reference in the source code, all it can do is leave a ’stub’ in the object code allowing the connection to be patched later. Patching together the external reference stubs of one object file to another is precisely the job of a linker. It is the linker that takes many object files and produces from them an executable file (e.g. an .exe file on Windows).

Interpreters

Whereas assemblers and compilers translate code into other forms of code, an interpreter is a program that translates code into action, i.e. an interpreter reads code and does what it says, right then and there. If you intend your program to be run via an interpreter, then every user must have both your program and the interpreter to run it, and your program is then started by starting the interpreter and telling it to run your program. (This may sound unfriendly to naive users, but the installation and starting of the interpreter can be disguised from users such that they install and run your program like any other.)

Because interpretation happens every time you run the program as you run it, interpretation introduces a significant performance overhead. This cost can be mitigated using what I call the ‘hybrid model’. First, the source code is compiled into some intermediate form (i.e. code which is more like processor instructions than high-level code but which is not executable by the processor), and then, to run the program, an interpreter executes this intermediate code. (In this model, the linking of the compilation units is typically done by the interpreter every time the program is run.)

A further refinement of the hybrid model is to use a JIT (Just-in-time) compiler. You use a JIT compiler as you would an interpreter—you run your program by feeding the JIT compiler some form of code (usually intermediate code)—but the JIT compiler compiles code into processor instructions and runs that instead of interpreting the code. Despite the time spent to perform this compilation (typically reflected in a longer program load time), JIT compiling is usually considerably faster than using interpretation: using a JIT compiler with the hybrid model is typically only 10%-20% less performant than were the code ‘natively compiled’ (compiled into an executable and run as such), compared to 70-100% slower for interpreting intermediate code. [The term “performant” is used by programmers to mean ‘fast performing’ or ‘acceptably performing’, but you won’t find it any dictionary—yet.] Some claim that, in a few cases, a sufficiently smart JIT compiler can run code faster than the same program compiled into an executable because the JIT compiler can make optimizations only discoverable at runtime. (The comparative performance of JIT compiling versus native compiling is a hotly debated topic. While most concede native compilation almost always produces better performance, it’s debated how much of a performance hit JIT compiling introduces.)

Understand that, whether using the hybrid model or not, an interpreted program is limited by its interpreter. Just as programs executed by the OS can only do what the OS allows them to do, interpreted programs can only do what their interpreter allows them to do. This has potential security benefits: as the theory goes, users can download programs and run them in an interpreter without having to trust those programs because the interpreter can block its programs from accessing files on the system and/or using the network connection, etc. In such schemes, the interpreter is often called a VM (virtual machine) because, as far as the programs which it runs are concerned, it looks and acts much like a full computer system. In practice, truly secure virtual machines aren’t quite a reality, for real VM’s have bugs which malicious programs they run can exploit to breach the limitations imposed by the VM; consequently, users should still be careful of which programs they download and run, even if the program is run in a VM.

Another often-cited benefit of interpretation is that, as long as an appropriate interpreter for your language exists on all the platforms you wish to run your program on, you only need to write the program once. This is often called ‘write once, run anywhere’. This argument made a bit more sense when computers were slower and so compilation took considerably longer, making compiling your program for all target platforms a bit more bothersome, but aversion to this inconvenience doesn’t really explain why interpreted programs are considered so much more portable. The real reason writing your program for an interpreted environment makes it generally easier to get it working on multiple platforms is that the interpreter acts as a layer of indirection between your program and the OS, so the interpreter can handle the messy particulars of dealing with variances between OS’s, e.g. the process of opening a file often differs from one OS to the other, but your program only has to tell the interpreter to open a file, and the interpreter in turn deals with the particulars of the OS.

The portability advantage of interpretation holds out as long as your program uses functionality that is available and works consistently on all of your target platforms. A notorious problem area is GUI’s (Graphical User Interfaces): many GUI widgets (windows, menus, scrollbars, drop-down menus, etc.) simply don’t look and act the same on Windows, Macs, and Linux desktops. Attempts to provide a cross-platform means of writing GUI code have to date only been partially successful.

In principle, any language can be either interpreted or compiled, but in practice, languages are designed with a particular model in mind. For instance, were you to interpret C language code, you would defeat the purposes of using C in the first place (mainly performance and greater machine control), and so this just isn’t done (though I bet someone somewhere has done it—someone somewhere has done everything, no matter how strange or daft). Another language, Java, was conceived and implemented to use the hybrid model; ‘native compilers’ (compilers that spit out processor instructions) for Java exist, but aren’t used very often because the performance benefits generally aren’t significant enough to be worth the downsides.

Thus endeth the lesson.

Singletons considered harmful

July 4, 2007 – 2:31 pm

Alex Miller, Steve Yegge, and this poster explain.

Among the reasons given:

  • Singletons are most commonly used as excuses to have global variables and functions.
  • As Steve puts it, “using the Singleton is usually just a sign of premature optimization…” .
  • Singletons make it difficult when later you decide you actually need more than one of that type or subtypes.