11 ways to ruin team play, a.k.a. Why Battlefield 3 will (probably) still be fundamentally broken

3 Sep

1. Where the hell is my teammate?

There isn’t much to say about this one. If you have no idea what your teammates are doing half-the-time, you can’t even begin to coordinate.

2. Hey, there’s my teammate! No wait, he just exploded.

Like characters in a horror movie, players in most team action games reflexively go their own way. Even in ostensibly team-based games, such as Battlefield, players have learned that it’s simply not worth the bother in most cases trying to follow teammates or trying to get teammates to follow them.

Perhaps the biggest reason no one ever sticks together is because quick deaths make it wasted effort. Nine times out of ten, if I spawn and dutifully hop in the helicopter you’re piloting, you’re just going to get me killed faster. On foot, two players moving together will spend most of their time trudging across the map only to end up with one player killed immediately in an ambush; so effectively, when the combat finally comes, teammates die too quickly most of the time to really fight together.

Mainly because of this endless quick-death/respawn cycle, most team combat games devolve into teammates barely playing together in any real sense at all. My teammates and I aren’t working together: we just happen to be trying to kill the same enemies.

3. Action, action, action! Who needs time to think or coordinate?

However long it takes one person to think, a group of even just two or three people seems to take ten times as long to do the same thinking. This is true even under ideal circumstances, let alone in a fast-paced action game with iffy text and voice communication.

Now this doesn’t mean we can’t have fast-action team play, but it does mean games should consciously allow for breaks in the action. Instead of ‘action, action, action!’, game design should always privilege ‘pacing, pacing, pacing!’

4. If one ball is fun, then ten balls must be ten times as fun.

Looking at sports, focus is almost always very clearly focused on a single ball, and for good reason. Too many points of focus mean the team play quickly devolves to incoherence: everyone just ends up doing their own thing, defeating the whole idea of team play.

5. If having ten players is fun, then having one hundred players must be ten times as fun.

Similarly, past a certain number of players, team play becomes incoherent. Too many players means too many teammates to coordinate with and too many opponents to worry about. Again, looking at sports, we see that eleven or twelve players per team seems to be the upper limit. In fact, on the soccer field, half the players spend most of the time standing outside the zone of action waiting for the ball to get closer. So the sweet-spot seems to be about 3-6 players per side actively engaging each other around a local objective.

Notice I said “local” objective. Teams beyond the 11-12 player threshold can conceivably work coherently as long as strong mechanisms are in place to divide the team into smaller units, each engaging separate objectives. For example, a Battlefield-style game could work with 32-player teams or beyond if stronger mechanisms were in place to enforce squad-based play and if those squads were ensured to have separate objectives. The danger, though, is that too many squads off doing their own thing eventually becomes as incoherent as a game of too many individual teammates off doing their own thing.*

*This is where the idea of a game played at multiple levels of coordination might make sense, e.g. a Battlefield-style game where squads take orders from a small number of commanders responsible for the overall strategy.

6. You can’t make me play medic! I’ll do what I want!

For role-based gameplay (such as class-based and/or vehicle-based combat), games shouldn’t leave role selection up to the whims of each player.

DICE has not only gotten this consistently wrong, they’ve made getting it wrong into a principle: on the one hand, their mantra ‘rock, paper, scissors’ means that the game balance relies upon teammates competently covering the various roles, but their other mantra ‘player freedom’ means DICE is not willing to take any necessary measures to make ‘rock, paper, scissors’ actually work.

Now, how exactly role-selection should be handled is up for discussion because the simplest solution (first come, first pick) isn’t satisfactory. ‘Who should get to fly the jet?’ is a difficult question, but ‘the guy who happened to spawn next to it at the right time’ is not an acceptable answer. Likewise, ‘Who must play medic?’ is a difficult question, but ‘no one’ is not an acceptable answer.

7. Nobody tells me what to do! I’ll do what I want!

So I’ve already asserted that players must be forced sometimes to play roles they don’t want to, but they also must sometimes be forced to go places they don’t want and attack/defend targets they don’t want. Again, game balance simply requires it to keep each game competitive, so mechanisms must exist that strongly incentivize the player to actually follow orders.

So where would players get their orders? Three possibilities:

  1. issued automatically by AI
  2. issued by a teammate in a command role
  3. issued by vote of the team/squad

Having the AI issue orders is least likely to trigger social unrest. On the other hand, AI decisions could diminish the human element that makes multiplayer (potentially) interesting in the first place.

If elevating certain players to privileged positions is too fraught with drama, voting could work as an effective substitute. For example, in Battlefield, your squad could pick its target by a vote for which flag to capture/defend (with perhaps seniority used to break ties); members of the squad would only be capable of capturing/defending the elected flag, making the vote meaningful rather than something players can ignore.

8. Voice chat: the highest form of communication known to man.

If you’re like me, the speech center in your brain nearly shuts down when occupied with action. Even for people for whom this isn’t the case, voice communication is still not the ideal medium for coordinating team play: even with solid connections, good headsets, and proper audio level balance for all players, it can be difficult to parse everything said or correctly identify who said what (especially in a team of many players).

To make voice chat more effective, larger teams must be broken into squads of four to six players, and those squads must stick together.

To better supplement voice chat, more games should include coordination mechanisms like the placement indicators in Portal 2 co-op. Designers should study the common messages actual players use and integrate them into the game.

Games also must be careful to make sure communication channels of all kinds don’t devolve into spam. In several games, the ‘enemy spotted’ message quickly becomes meaningless because players constantly spam it with a hotkey. In other games, medics tune out calls for healing because the teammates are too often nowhere nearby to be healed and die before the medic can reach them.

9. Hats for sale! Someone stand on my head so I can get this achievement.

Every time new items get released for TF2, half the players sit around in spawn for the next week playing the trading meta-game rather than, you know, actually playing the game.

Players also get distracted by achievements and character advancement. If I’m trying to get my 1000′th headshot to get an achievement or gun upgrade, I’m not really playing for my team to win.

10. Hey, there’s this neat trick I read about online where if you jump on these boxes in this one exact spot and melee at the same time…

Some people like to join multiplayer games to deliberately Not Actually Play the Game. My favorite kind are the 10-year-olds who want to try the neat exploit they heard about from their friend’s cousin who knows a guy who did it once before. ‘Don’t shoot those barrels! If I stack them in the right way, I can reach that ledge and fall through the floor to…’ etc. etc.

Others join games to grief, and others just sit in spawn while they organize their hat collection.

Whatever their reason for not actually playing the game, these players must be gotten rid of quickly and without hassle lest they ruin the team balance. Amazingly, many games still don’t appreciate this. In fact, few games have figured out how to do vote kicks properly: rather than calling votes that expire, a vote to kick a player should persist indefinitely until a majority has voted to kick the same player.

11. We won! I guess we killed more (or died less) than they did.

In a good, satisfying match, players can tell a story of why and how they won or lost. Even if it’s as simple as ‘we got pinned down at this place’, it’s still better than just ‘the winning team killed more (or died less)’.

Teamwork is more than just the collective sum of our actions. If you and I are working together, that means we make some decisions together and/or at the very least observe each other’s major decisions. e.g. ‘we decided to take the underground passage to infiltrate the enemy base, and Ted decided to set up a sentry gun’. That is the beginning of a narrative, a story of what we did together.

In an incoherent team match, teammates don’t make decisions together, don’t observe each other’s major decisions, or perhaps just make and observe so many decisions that they all blur together into a meaningless mush.

In a great game, the winning-or-losing story of each match tends to be novel rather than the same old thing every single time. A team game with matches that lack any narratives to speak of simply can’t be great.

 

Why David Brooks is awful

5 Mar

Were it not for laziness, I’d have long ago articulated exactly what I hate about David Brooks. Fortunately, my laziness has payed off in the form of two recent Salon takedowns of his new book, which together hit the mark close enough that I needn’t bother. The first is PZ Meyers critiquing Brooks’s clutch cargo deployment of science factoids:

The technicalities don’t illuminate the story in any way, and the story undercuts the science. Ultimately, the neuroscience in the book feels a micrometer deep and a boring lifetime long, with the fiction[...]giving the impression that it’s built on a sample size of two, and both [samples] utterly imaginary.

The second is Alyssa Battistoni critiquing Brooks’s core message, that the solution to the problems of the world lie in everyone simply being more like our current stock of super-awesome elites:

Brooks is so enamored of his vision of a new economy, driven by American middle-class values and invariably described as “creative,” “diverse” and “innovative” (because who can argue with those?) that he can’t see that this seemingly bright future is already leaving millions behind. Indeed, his vision of the future world is literally just a variation on Richard Florida’s “creative class” economy in which cities around the world compete for global elites. Brooks gets that certain people — the Mark Zuckerbergs of the world — are the “sorts of people who become stars in an information economy and a hypercompetitive, purified meritocracy.” But although he acknowledges that it will be necessary to address “human capital inequalities” to give everyone a “chance to participate,” he doesn’t seem to understand that the meritocracy he champions is anything but purified. Instead, it’s the Organization Kids, with their elite educations and global connections, who have the advantage in a competition driven by “relationships” and “charisma,” which begins to sound suspiciously like the old boys’ network of yore.

 

 

What’s wrong with Android UI and what to do about it

7 Dec

After a few months of using Android, I think I can now articulate my displeasure with the UI and what should be done about it. The prescription comes down to this:

  • First, Android needs a different paradigm for navigating between applications and within them, one which doesn’t rely upon a Back button.
  • Second, the Menu button, Search button, and long pressing can and should be removed.

Before getting into the details of how this would work, here’s why the current UI is a problem:

Web-style navigation

Much like the paradigm of the Web is a collection of pages navigated page-to-page, the paradigm of the Android UI is a collection of screens navigated screen-to-screen. On the Web, this style of navigation is necessary because there are simply too many pages to be centrally organized, so the only structure that emerges is the ad hoc graph of links. To compensate, however, most sites impose a structure on their own pages, usually via a navigation banner and side menus. On Android, link-based navigation has been adopted not because of a preponderance of screens (the total number of screens in all apps installed on an Android phone is decidely finite) but because the small screens don’t afford catch-all structuring mechanisms like navigation banners or a Windows-style taskbar. In practice, then, each Android screen offers the user a very limited set of links to navigate to other screens, and so designers must very carefully choose for each screen the few links most helpful to the user.

When this pattern works well, each screen seems to magically anticipate the user’s next immediate need. When the pattern fails, the user puzzles over how to get back to that one screen they know exists but can’t get to directly from their current screen.

Things can get especially confusing when the user is linked from one app to another because this blurs the distinction between apps. Even worse, sometimes a link doesn’t take the user to another app but rather to a screen “borrowed” from another app (a screen used as an embedable widget, effectively), which may mistakenly lead the user to belive they’ve been warped to another app when they haven’t. This distinction between warping to a screen and “borrowing” a screen is way too subtle for most users.

Back button

To help the user cope when they get lost, Android has a hardware Back button which is meant to always take the user back through their history of screens. Confusingly, though, Back has four meanings depending upon context:

  • Go back to what I was just looking at.
  • Go up to the parent screen.
  • Dismiss the popup.
  • Go to the previous URL in this browser window.

Sure the guidelines say Back is always supposed to mean ‘go back’, not ‘go up’, but even stock apps violate this rule.

Even when the Back button works properly to mean ‘back’, it can still be confusing, for people tend to quickly forget what they were just looking at beyond two or three screens before. So there are two ways users actually use the Back button to go ‘back’:

  • the user knows what they’ll get when they hit Back
  • the user suspects what they’re looking for is there somewhere, and they’ll stop hitting Back when they see it

This first scenario probably plays out only when the user is taken to a screen and quickly realizes it’s the wrong screen. The second scenario can be an effective strategy for users, but it’s a very displeasing form of navigation. Imagine working at a workbench where the tools are kept not in their proper spots but in a mystery bag which you must go through one item at a time every time you wish to grab a tool. Sure it helps that the most recently used items are always at the top, but it still gets tiresome quick.

Home button

When users get stuck in a navigation deadend or simply get lost, they can always reorient themselves by escaping to the home screen with the Home button. Unfortunately, this doesn’t always fix the user’s disorientation because many apps don’t take the user back to the same screen every time the app is selected in the home screen, so the user can’t reliably reorient themselves within an app by leaving the application and re-entering.

Search button

The hardware Search button is meant to provide quick access to a common task, but it often goes untouched because it’s redundant (most apps provide a search button on screen) and unreliable (the user can’t always know what they’ll get when they press it). Sure, the Search button always takes the user to search, but search of what? Depending upon the current screen, it might take you to a search of the web, or it might take you to a context-specific search of your mail. (Even worse, in some dialogs, Search actually doubles as a Back button to back out of dialogs.)

I’m also skeptical that search is such a pressing use case for most users to deserve its own hardware button. It seems like the hardware Search button was included just as a sop to Google’s self-identify.

Menu button and context clicking

The core design challenge with the touchscreen phone form factor is the lack of screen space. Even when the screen is very high-res like on most recent Android devices, the text and buttons can only get so small before they are too small to read or click.

To mitigate this problem, Android punts many buttons into popups that only appear when the user presses the hardware Menu button or when the user “context clicks” with a long press. The primary problem here is that hiding functionality in hidden menus and popups makes many functions less discoverable and harder to recall where to find. These troubles are amplified by the bothersome way in which these menus and popups are activated and dismissed:

  • the Menu button is difficult to push with one hand (especially on the Droid X)
  • context clicking requires waiting nearly a full second
  • dismissing the menu requires clicking somewhere off the menu or by hitting the Menu button or Back button
  • dismissing most context popups requires hitting the Back button (which, again, is especially tricky on the Droid X with one hand)

The hope is that users will learn to look in the menu and try context-clicking things, but even when users do learn, the slowness and awkwardness of activating and dismissing the hidden popups makes browsing for functionality bothersome. Many users simply fail to catch on to these conventions, and even those who do may be slow to find hidden functions and work them into habitual use. Consequently, many apps place some buttons redundantly both on screen and in the menu.

So while hiding buttons in popups can result in a very aesthetically pleasing chromeless UI and also leave more room for displaying data on a very small screen, I don’t believe the benefits outweigh the cost in navigation ease and obviousness and annoying dialog modality.

Fixing it

So now, here in detail is how these issues could be addressed:

Rules for screens

To get a handle on navigation, we need a handle on the set of places the user can visit and how they can move between them. In Android, we’ll call these places “screens”. The rules are:

  • The Home screen is a single screen.
  • Each application, as represented by an icon on the Home screen, consists of some finite, fixed number of screens (two or three on average, five or six on the high end).[1]
  • No other screens exist. Application screens and the Home screen are it.
  • Clicking an app icon in Home takes you back to the screen in that application which you last viewed.
  • User actions in one screen can warp the user to another screen within the same application or to a screen within another application.
  • User actions in one screen can also modify the state of other screens.
  • Clicking an item in the notification list warps the user to a screen within an application.
  • Within an application, the screens are organized strictly left-to-right. Content within a screen is always scrolled vertically rather than horizontally. Users navigate between screens within an app by swiping left and right.[2]

A major goal of these rules is to establish and maintain spatial coherence for the user. The user needs to feel like the screen they are looking at fits somewhere. When the user has a destination screen in mind, they want to picture that destination in space relative to their current position. The way Android currently works, when users mentally construct a path from their current position to their destination, the path is a chain of links that lack spatial coherence. A lack of spatial coherence works well enough on the Web because users of the Web constantly visit new territory rather than familiar places: when you go somewhere new, you don’t know your way around anyway, so navigation that foils spatial expectations is not really any worse. But when the user is in frequently visited territory, it’s disconcerting when the territory lacks a defineable shape.

To maintain a sense of spatial coherence in Android, users need strong cues of where they are currently located, especially when they are warped to a different screen. So on the far left of the notification bar, the user always sees two icons: one representing the current application, the other representing the previously viewed application. When the user is warped from one application to another, these icons flash to indicate the move. To the right of these two icons, the user sees a row of dots wherein each dot represents the screens of the current application in order from left-to-right, and the dot representing the currently viewed screen is highlighted. When the user pans between screens, the dot of the new current screen faintly blinks.

When user actions modify the state of other applications, it’s generally a good idea if the state modification adds to the state rather than changes existing state. For example, when the user is taken from an app to a new URL in the Browser, the new URL should be added as a new browser window and leave all other open windows unaffected.

Grouping screens with tabs

To keep the number of screens down to a sensible few, many applications might simply cheat by stuffing multiple screens into the space of one using tabs at the top. This makes sense in many cases, particularly when the tabs represent different views of related data. For example, in the Music app, the Artist, Album, Song, Playlist tabs are more or less different views of the same collection of tracks.

When apps use this strategy, they should attempt to only group tabs which logically go together. In general, the number of tabs per screen should not exceed five. When you really need more than five, use an arrangement where two or three tabs are fixed in place, but the rightmost tab is actually a pulldown that includes all the remaining tabs.

Navigating with “Home” and “Swap”

Android should have just two hardware buttons: “Home” and “Swap”.

The Home button, of course, takes the user to the Home screen, just as it does currently.

The Swap button replaces the Back button and simply takes the user to the previously viewed application. Pressing it repeatedly toggles the user back and forth between the current and last application. (Recall that the notification bar always displays the icon of the current and last app.) When swaping to an application, the user is always taken to the screen within that application which they last viewed.

Pressing Swap at the Home screen takes you back to the current app (the app you were at before pressing Home).

On-screen Menu

If the Menu button is to be gotten rid of, the buttons of the menu need to appear on screen permanantly. The simplest solution is to simply put the buttons in a menu bar, a row at the bottom much like they appear in the current design when the user pushes the Menu button. However, to minimize use of screen space, the menu bar should consist of only a single row of buttons, never two, and less common buttons should get punted into a pullup revealed by a right-side ‘more’ button. (To make it even smaller, perhaps the buttons should only have a text label without an icon so they can be much shorter, or perhaps the icon can be placed next to the label rather than on top.) The number of buttons should not exceed four, so to fit five or more will require punting into a pullup menu.

In many cases, buttons currently seen in menus can be gotten rid of because they would no longer be necessary. For example, settings, by convention, should be placed on the rightmost screen of an app, so users shouldn’t need menu buttons that warp them to the settings screen. In general, menu buttons that warp the user to other screens should no longer be necessary. This would have the nice benefit of removing the redundant appearance of many menu buttons on multiple screens.

For screens with a need for fewer than three buttons or which simply wish to minimze the visual footprint of the menu bar, the menu bar may consist of just a single ‘more’ button that doesn’t span the whole screen width.

Of course, some apps way wish to have additional always visible buttons. In these cases, the most important buttons should generally appear in a larger size at the top of the screen rather than as a second row. The menu bar is meant to hug the edge of the bottom as much as possible; adding other rows on top would require the buttons to be much taller for easy clicking.

Rules for popup menus

For the purposes of this discussion, a popup is a modal overlay which covers the underlying screen area. While some popups are simply centered in the middle of the screen and block all other interactions, others may be ‘attached’ to an element on screen (so as to move with the screen when it’s scrolled) and which may or may not block other interactions. These latter kind are typically small, containing a few buttons. These ‘attached’ popups should be generally favored, as they maintain a stronger sense of space: when a popup floats freely, it’s much harder to remember where it comes from and so recall how to get back to later.

To give popups a consistant, distinct look so that they can be discerned from regular screen elements, popups should be given a strong border with a drop shadow effect to visually convey that they overlay the regular screen area. Also, every popup needs an on-screen dismissal button, a distinct X for close. (In OK / Cancel dialogs, the X can go on the Cancel button.)

Rather than integrating the keyboard into screens, it should always appear as a popup, with the standard dismissal button and the other visual cues that indicate it is a popup.

Generally, context popups should appear below the item as a row of small buttons, similar in appearance to the menu (though with somewhat smaller buttons.) If the number of buttons exceed four, four more buttons can appear on a second row. If the number of buttons exceed eight, the rows scroll vertically to reveal additional rows. When a popup is revealed that runs off the bottom of the screen, the screen should scroll down a bit, then scroll back when the popup is dismissed.

The dismissal button for these context popups should appear in the same place where the user clicked to make it appear.

Popups without long pressing

Currently, some items can be long pressed to present a context popup or regular tapped to perform a default action. Most commonly, the default action displays the tapped item in full.

To get rid of long pressing, one solution is to simply get rid of the context popup, and if the actions presented in the context popup are really necessary, they can be presented as buttons when the full item is displayed.

Another solution is to make single-tapping present the context menu. This would make the default action require two taps, but this downside is often acceptable.

Another solution is to attach a ‘more’ button to each item. Tapping the item would perform the default action, but tapping the ‘more’ button would present the context popup. This solution, of course, only really works for items of a sufficient size, such as full-width items in a vertical scroll.

One special case is the Maps app, where the user needs to perform actions on points on the map. It might be annoying if the user gets a context menu every place they tap, so one solution could be to place a marker where the user taps, and then buttons at the top or bottom of the screen perform actions on the spot under the marker.

Another special case is how to do text selection and cut-copy-paste without long presses. Single tapping and dragging on text places the text cursor, so long presses are currently used to enter a selection mode. To get rid of the long presses, a solution is to place these functions in a ‘more’ button on the text box or on the keyboard popup. For example, to highlight text, the user places their cursor on a word, taps ‘select text’ under ‘more’ to highlight the word, then drags the triangles to modify their selection. The user can then cut or copy the text from the ‘more’ button.

Single-screen Home

Sometimes the Swap button will not serve the user’s multitasking needs when the user needs to switch quickly between more than just two applications. Consequently, we should tweak the Home screen to better accommodate cases where the user is switching often between more than their last two apps.

First, the Home screen should consist of a single vertically-scrolling screen, not multiple screens swiped left and right. The scroll area is divided into a top customized area and an alphabatized area below it. If the user moves an app icon into the customized area, it doesn’t appear in the alphabatized area.

When the user presses Home from within an application, they are always taken to the top of the Home screen. When at the home screen, pressing Home will scroll to the top of the page.

To make navigation within a long Home scroll quick and easy, the scroll bar is always visible on the right, so users can simply tap a position on the scroll bar to scroll faster. On the scroll bar, dots indicate the position of the last five-or-so used applications (much like errors and warnings are highlighted on the scrollbar in Eclipse and other IDE’s). (These dots should probably vary in intensity so that the most recent app is indicated with a stronger color than the least recent.) So if the user needs to switch quickly to a recently used application from the bottom of their Home screen, they can press Home, look for the dot in the scrollbar and click that part of the scroll bar to quickly get to the icon of that recently used app.

Pressing Home while at the Home screen scrolls the Home screen back to the top.

The dialer icon is treated just like that of any other app. While most phones will come by default with the dialer app icon at the top of the home screen, it can be moved or removed just like any other.


[1] A few odd applications, like some games, may get away with breaking this paradigm, but generally as many apps as possible should fit the mold.

[2] In applications where the content view is dragged both veritcally and horizontally (such as the Browser or Maps), translucent tabs appear on the sides for the user to drag to navigate to the adjacent screens. Having the special case is less than ideal, but I believe probably worth the cost. If left-right swiping on the application screen doesn’t work well, perhaps swiping left-right on the navigation bar or perhaps dedicated hardware buttons are a better solution. Or maybe a special left-right swiping area always at the bottom of the screen, one that is translucent and doesn’t block regular clicks or vertical swiping; this would make the left-right swiping behavior consistant for all screens.

I hate nature

23 Oct

Stop creating new languages

30 Aug

Every couple of months, an announcement for a new language pops up on ProgReddit or Hacker News. While some of these languages might have interesting ideas, their ideas rarely justify whole new languages, so mostly these languages seem like arbitrary remixes of existing ones. Consequently, these languages’ authors often come off a bit like crackpots: ‘Look, everyone! I’ve rearranged the bookshelves with my new classification system. Once you master it, you’ll find browsing of biographies 6% more efficient and reshelfing of autobiographies 11% more efficient! *ehem* Once you master it.’

Some observers react to this steady nuisance of quixotic pet projects by dismissing the need for better languages entirely. This is sensible in the short term because new things in programming rarely constitute big enough improvements over the day’s status quo to justify the transition costs. In the long run, however, it’s myopic: the languages and tools of today are generally significantly superior to what we were using a generation ago, so it’s not unreasonable to expect further significant advances.

In one reading of the history, though, the improvements we’ve seen in the last twenty-odd years are entirely from the realization of old ideas—automatic garbage collection, full object-orientation, functional programming, etc.–and so it’s claimed that no one has had any really new ideas for decades now. There’s something to this observation, but we still shouldn’t reject new languages out of hand:

  • First, the original formulations of the old ur-ideas prompted many practical questions, but many of our answers to these questions still remain sketchy, leaving open the possibility of more fundamental changes to come.
  • Second, while I think it very unlikely that, at this late date, someone will identify a new programming paradigm, it always seems naïve to declare the End of History and rule out any future potential for big, transformative ideas.
  • Third, and most importantly, I don’t believe languages must only advance on big ideas, for little details matter—they add up. Even if what most new languages largely do is just rearrange the furniture for the sake of aesthetics and minor efficiencies, after a few rounds of 5% improvement, you begin to see a real qualitative difference. Python, for instance, is semantically not all that different from Perl, but what a difference sane syntax makes.

So am I saying we should tolerate the crackpots? To a point. Any new language warrants major skepticism, no matter the source, but especially a language coming from an unknown. It’s for a good reason that we have a natural tendency to treat the opinions and ideas of established voices much more charitably—both in time and sympathy—than those of unknown quantities: without this bias, we’d waste a lot more time on crap than we do already, for it simply can take a lot of time and effort to discern the difference between a crackpot and someone worth listening to.

So as an unknown with something Important To Say, you must be very careful in how you present yourself and your ideas so as not to be dismissed as a crackpot. I have two pieces of advice. First off:

Don’t be a crackpot.

Obvious, perhaps, but surprising how many people miss this one. Second:

Be as clear as possible.

Only when reading a name-brand am I willing to accept that difficulties in comprehension are my own fault, not necessarily the author’s. Not so for an unknown. If James Joyce hadn’t written Dubliners, it’s doubtful anyone would ever have read Finnegan’s Wake, let alone called it brilliant.

In the particular case of introducing a new programming language, it’s especially critical to be very clear about the problems your language addresses. What’s the point of this thing? How is it supposedly actually better? Before I continue reading what you have to say, I want to know that you’re not just re-arranging the furniture.1

So it’s with full awareness and trepidation that I admit that I, too, have tried my hand at designing a programming language. Following my own advice, I’ll try to be up front about what I’m pushing: a Lisp people will actually learn and use.2

Here’s what I want in a Lisp, in order of ambition:

  • Easy to learn: The standard dialects of Lisp tend to be taught ineffectually and tend to be unnecessarily confusing. (Yes, that includes Scheme.) I won’t go in to details here, but suffice it to say that ease of learning really matters—not just for the sake of getting more people to use the language, but for the sake of getting those who use the language to truly understand it.
  • Readable syntax: As everyone knows, Lisp has a problem with parentheses. Proponents argue that you just get used to it, and this is true, but the preponderance of parentheses constitutes a lot of line noise that I believe hinders readability (and editability) even for experienced eyes. Additionally, some Lisp dialects get a bit too noisy with reader macros, such as having the apostrophe for quoting all over the place. Furthermore, I find that the irregularity of the standard indentation style of current Lisps is unnecessarily difficult for learners to grok and leaves too much to stylistic choice.
  • Syntax highlighting, code assist, and assisted refactoring: Programmers working in Java and C# have become much accustomed to conveniences that keep their code neat, that provide quick access to documentation, and that free them from having to remember minute details such as type taxonomies, function signatures, and precise identifier names. Providing those same conveniences in a dynamic language is much more challenging and error prone because something as simple as renaming an identifier often requires that the tools make risky assumptions about what’s going on at runtime. Up to now, solutions to this problem have relied upon very sophisticated code analysis that still doesn’t work right much of the time. I believe there’s a simpler solution.
  • Push-button debugging: Programmers working in Java and C# become accustomed to no-hassle debugging, where setting a breakpoint requires just a click and where the IDE takes you through the code as you step through. This level of ease is lacking in most other languages, but especially in Lisp, where macros complicate the process.
  • Embedded data: Lisp’s tree-based syntax makes it usable as a structured-data format, meaning we don’t have to punt data into a separate format, such as XML or JSON. Instead, data can be expressed in Lisp using an ordinary library rather than a special syntax that requires special processing and tools. This could spare us from perverse data languages, like XSLT, which inevitably contort into full-fledged—and crappy—programming languages. The trouble is that standard Lisp syntax doesn’t work well for data dominated by text, i.e. documents. So, for instance, while you might use a current Lisp in place of JSON, you probably wouldn’t use one in place of HTML.
  • Embedded languages: While some languages arguably shouldn’t exist at all (some haters say this about Java, for instance), other languages, like C and C++, clearly exist for a reason. But the fact that these languages fill necessary semantic niches doesn’t mean that they need their own syntaxes: instead, the right dialect of Lisp could “host” the complete semantics of a foreign language as a library. Consider a C program, which is typically written as a mish-mash of C code, preprocessor directives, and build files (makefiles, etc.). We could create a Lisp library that allows us to write C semantics in Lisp and produces the same end product (executables and object code) but which would elegantly integrate the equivalent functionality of the preprocessor and build chain in a way that is cleaner, more flexible, and easier to learn. If a way can be found for Lisp syntax and macros to provide the ideal amount of syntactical concision for all possible languages, future language designers can forget about syntax and just focus on semantic innovations.3

Now, as it turns out, the Lisp I want in all other respects resembles Clojure, so really what I’m proposing is specifically a Clojure dialect. In fact, implementation of my dialect won’t require much more than swapping out Clojure’s reader, wrapping some of its macros and functions, adding one or two data types, and creating editor assistance.

I’m calling my Clojure dialect Animus. Animus is still very much in flux, but I describe it in its current form here. Also take a look at some experiments with various languages to see what they might look like embedded in Animus.

  1. Or at least, if you are just rearranging furniture, I’d much rather you be honest about it: if you yourself realize that that’s what you’re doing, then you at least have a chance of delivering an actual—if small—improvement to the status quo. []
  2. This isn’t actually what I set out to design. When I first started thinking about a language a few years ago, my favorite language was Python, and I didn’t know Lisp, so for a long time I was simply thinking of ways to improve upon Python. At some point, I accepted the idea of prefix notation and macros, and things progressed from there. []
  3. Haskell strikes me as language that could greatly benefit from embedding in Lisp. The few times I’ve attempted to pick up Haskell, I’ve been offended by the ridiculous Perl-like syntax of ad hoc convenience piled upon ad hoc convenience. If there’s something worthy in Haskell’s semantic model, it’s obscured under a mess of syntax. []

Reinventing the desktop (part 2): I heard you like lists… [text version]

3 Aug

I originally posted this as a screencast, but I figure a lot of people want to scan rather than sit through a whole 40 minute presentation, so here’s the same stuff (somewhat abridged) in text form.

In part 1, I made a negative case against the desktop interface as it currently exists, but I promised to make a positive case for my solutions. Because it would take at least a few weeks to put together a complete presentation, I thought it more timely if I instead present the ideas in installments (and hey, more reddit karma whoring this way). Most of the pushback (both constructive and vitriolic) to part 1 concerned my ideas about lists, so I’ll start there.

Lists good, hierarchies bad

Many of the most notable recent innovations in software have revolved around lists:

  • Before Google, people had the idea to organize the web in a catalog, a big hierarchy of everything, e.g. the Yahoo directory. After Google, it became clear that a list of search results is far superior, and now such directories are mostly remembered with head-shaking bemusement (to the extent they’re remembered at all).
  • Gmail greatly deemphasizes the notion of sorting mail into separate folders and instead organizes mail by tagging and search.
  • Before iTunes and its imitators, users would play their music by navigating into folders, e.g. ‘music\artist\album\’. Today, iTunes simply presents everything in one big list that is textually filtered.
  • A blog is basically any site on which new content appears strictly in a chronological list: new stuff comes in the top, old stuff goes out the bottom. So, for instance, on a non-blog like Slate.com, some attempt is made to hand-editorialize the presentation of content on the front page, as in a magazine, but on Boingboing.net, the authors just create new content and post it into the stream.1
  • Link-driven sites, like Slashdot and Reddit, also revolve around lists.
  • So do many social sites, like Twitter and Facebook.

The way these examples use lists differently is mainly in how they order their items. For instance, in Google search, results are ordered by relevance to the query whereas, in Reddit, items are ordered by a combination of chronology and user votes. The key lesson here is that, if you can find the right way to order and filter things, you probably are best off presenting them in just a big, flat list.

My favorite example of this is the AwesomeBar introduced in Firefox 3. The AwesomeBar filters my history and bookmarks as I type and orders items by “frecency”,  a combination of the recency and frequency with which I’ve accessed the items. This means that I can type, say, ‘sl’, and my Slashdot.org bookmark will reliably appear at the top of the list. So when I want to visit Slashdot, I just reflexively type <alt+d>, ‘sl’, <down>, and <enter>. I don’t have to navigate a menu of any kind, I just act on reflex. This works so well, in fact, that I don’t use the regular bookmarks menu at all anymore.

The AwesomeBar isn’t without flaws, however. Consider that there are three different basic cases of search:

  • In some cases, I know specifically what I want.
  • In other cases, I only know generally what I want, e.g. I want to play some game, but I haven’t decided on a game, and perhaps I’m not sure about my options.
  • In the remaining cases, I just want to browse. Sometimes this is because I’m just bored and looking for something to do, but often I browse because I just want a refresher on what things exist, e.g. I browse my calendar because I need to see if there’s anything there I’ve forgotten.

While the AwesomeBar is awesome when I specifically know what I want, it’s somewhat less awesome when I only know generally, and it’s not at all awesome when I know not at all. In particular, I want a way to browse the sites which I’ve bookmarked but haven’t returned to, because many of these urls are things I didn’t have time to consume at the time but bookmarked so as to consume at a later date.

One solution would be to perhaps create a distinct kind of bookmark for sites I intend to consume later rather than visit on a regular basis. Another solution would be to make the Firefox “library” window (“Bookmarks–>Organize Bookmarks…”) more usable and fix its behavior: currently when you delete history, the ‘last visit’ date for each bookmark is lost, meaning you can’t afterward browse just the sites which you’ve bookmarked but forgotten about.

Launching programs without hierarchies

The core mechanisms for program launching in Windows and Linux are hierarchical start menus. In Windows, an individual application is generally placed in its own folder in the start menu, but in Gnome, applications are sorted into categories. The problem is that such sorting is largely a fool’s game. Consider:

sound and video

Sure I might think to look under Sound & Video when I want to burn an audio CD, but if I just want to burn data, it’s not going to occur to me to look there. Why put the disc burner there and not under Accessories? Well, in fact, Brasero is listed under Accessories too, but there it’s called CD/DVD Creator.

Why is Sound & Video one category and not two? Well that would leave us with two categories, each containing just one or two items, which would be silly.

These sorts of dilemmas tend to abound with categorization, leading us to settle for compromise solutions, such as:

  • OpenOffice.org Drawing is the only OpenOffice.org app listed under Graphics and not Office.
  • Evolution is listed under both Office and Internet.
  • We have all this miscellaneous stuff, and, hey, it’s gotta go somewhere, so we stuff it under Accessories.

Combine these faults with the fact that many users find it difficult to mouse through cascading menus, and the end result is that people don’t like using the start menu, so we make up for these deficiencies by piling on other conveniences:

  • Shortcuts on the desktop.
  • The QuickLaunch menu on the taskbar.
  • The system tray.2
  • The recently-opened programs list.3

The one addition I really like, though, is the text search/filtering added to Vista’s start menu. This allows for AwesomeBar-like behavior, e.g. I can type ‘fi’ and hit enter to launch Firefox. Also really nice is that I can type some term and see all relevant Control Panel items whether my term strictly matches those items or not: for example, I can type “reso” and get “Adjust screen resolution” even though there’s no Control Panel item of that name.

resolution

Simplify, simplify, simplify

So the question is, might we be better off giving users just one or two mechanisms for launching programs rather than half a dozen? I believe we would, but my solution requires accepting a few somewhat unconventional premises:

  • First, as I’ve already described, organizing things into categories is largely a fool’s game.
  • Second, when mouse-only mechanisms seem too inefficient, designers tend to introduce additional mouse-oriented mechanisms, which not only create redundancy, these mechanisms often challenge users with poor mousing skills and almost always involve adding new screen elements. If we could somehow make keyboard interactions easier to discover and recall, we could stop trying to get overly clever with the mouse and could clean up some of our messes. I believe this is doable with real-time textual filtering and a few other tricks.
  • Third, if you’re going to present a list, don’t be afraid to let it take up a proper amount of screen space so users can actually read and scan the damn thing. Some designers think big lists scare users, so they scrunch lists into small boxes, requiring the user to scroll a lot and manually resize columns. This is silly: if something is too scary for users to deal with, don’t present it at all. You aren’t helping by making the information hard to view.4

So what’s the solution? Well let’s start by flattening Ubuntu’s Applications menu out into one big list, and while we’re at it, let’s throw in the shutdown and settings items:

ubuntu menu 2

Too long, right? Maybe, but it seems pretty decent to me. It’s only about twice the height of the typical screen resolution, and as long as the most frequently used stuff is in the top half, is it really going to kill the user to occasionally scroll down? Besides, most users who care about efficiency will pick up the habit of opening most applications by filtering:

filtered menu

Here, the user types ‘w’, and so the list only shows the items matching a term starting with ‘w’; the word processor is listed first because that’s the item which the user has most frequently selected in the past when they type ‘w’. Users can also filter on terms that describe a program but aren’t necessarily in its title:

filtered menu 2

Here, the user’s query ‘ga’ matches the tag ‘game’, so the user sees all items with that tag.5

So that’s basically it. With a big filtered list, I don’t see a need for shortcuts in a QuickLaunch menu, shortcuts on the desktop, shortcuts pinned to the start menu, shortcuts to recently opened programs in the start menu, or shortcuts pinned to a dock/taskbar.6

I should note that this isn’t terribly radical, and in fact, it isn’t all that different from the direction Gnome and KDE have been heading. The Gnome shell prototype, for instance, introduces textual filtering. What I find odd, though, is that both projects seem very attached to the idea of categorized menus. Here, for instance, is a recent KDE screenshot:

kde menu

In this design, the categories slide into view rather than pop out. Sadly, this make navigation among the categories no less annoying, just annoying in a different way.

Application menus

If we can reduce program launching to just a big filtered list, could we do the same to the traditional menu bars in applications? Well, here’s what you get if you stuff everything from the menu bar of a moderately complicated program, Paint.NET, into one big list:

paint menu long

This is about the same length as our program menu, but for application controls, it doesn’t seem as acceptable. The fix is to pack things horizontally7:

paint menu wide

The question, then, is how to add textual filtering. We could simply have matching items show up in a one-dimensional list, as usual:

paint menu filtered simple

Here the user types ‘b’, and so items beginning with ‘b’ show up, with the most frequently used items showing up first. Alternatively, we could simply highlight all items that match the query:

paint menu filtered highlight

The solution I like best, though, is to combine these two such that we highlight the matching items but filter out sections without any matching items:

paint menu filtered combo

It may have occurred to you that this idea bares some resemblance to the Microsoft Office 2007 “ribbon” interface: just take the individual ribbon tabs, array them vertically, and add a text field on top:

word menu

(For our purposes, ignore that this is an offensively complicated array of controls. Obviously you wouldn’t want to bombard a user with something like this.)

The thing I really like about the ribbon is that, unlike the traditional menu bar, the ribbon directly contains complex controls, so a lot of stuff which would otherwise get punted into annoying dialog boxes can be done directly in the ribbon (or at least in little pop-out overlays, which aren’t nearly as annoying as dialogs). This is something menus going forward should imitate.

On the other hand, the most annoying part of the ribbon is that it’s modal: the user has to often switch the currently-viewed tab to get at a control. In contrast, with a pull-down menu, the user is always oriented at the same place (the top) every time it’s opened. I also believe that a big scroll is easier to scan and better facilitates the user’s spatial memory: more is visible at once, your eyes can track as you scroll, and everything is in clear spatial relation to everything else.

A pull-down menu obviously has a disadvantage, though. In the ribbon, related functionality tends to live together on the same tab, and the last-used tab stays visible; consequently, a lot of tab switching is avoided that otherwise would be required. In a pull down, while it’s nice that the menu is hidden when not needed, quickly repeated actions annoyingly require opening the menu (and potentially scrolling) for each action. The solution to this—without resorting to toolbars—I’ll discuss in a later installment.

Command filtering

Ubiquity is a Firefox add-on which adds a command line. Unlike a traditional command line, Ubiquity effectively guesses what the user is trying to say rather than requiring the user to precisely recall the full names of commands and their precise syntax, and it does this basically by treating the user-entered text as a query to filter the set of commands. In the next installment, I’ll describe how something very much like Ubiquity would work at the desktop level rather than just confined to the browser.8

One text field to rule them all

So it looks like we’re going to have a bunch of text fields in our desktop for doing different things:

  • entering urls and searching bookmarks and history
  • searching the web
  • searching our filesystems
  • launching programs
  • searching application menus
  • executing commands

Ideally, we could combine these all into just one universal text field such that I can just reflexively hit a keyboard shortcut, start typing, and then decide what kind of action to perform—whether a web search, a command, or whatever. I’ll discuss how this is managed in the next installment, which will primarily cover window management.

Continued in part 3 (coming soon).

  1. And notably, Slate has moved in recent years towards a more blog-like front page. []
  2. The system tray, of course, is supposed to be for status indicators, but many programs end up abusing it. []
  3. Found in the start menu since Windows XP. []
  4. Be clear that there’s a distinction between hiding controls and hiding information: a bunch of controls, obviously, can intimidate and overwhelm a user, so it makes sense to be careful about how many controls the user sees at once. []
  5. The user needs some sort of indication of how an item matches a query, so here, perhaps the tag ‘game’ should appear highlighted next to each item. []
  6. A lot of people like how the OS X dock keeps an application’s icon always in the same place, allowing for reflexive program switching. As I’ll describe in the next installment, my design retains this affordance in a different way. []
  7. In a list where the set of items changes, arraying things in two dimensions is generally bad because it means things tend to shift around in a confusing way; when the set is fixed, things aren’t going to move around []
  8. This isn’t original, of course: Ubiquity actually derives from Enso and Quicksilver, which are basically command lines for the desktop. []

Reinventing the desktop (part 2): I heard you like lists…

2 Aug

In part 1, I made a negative case against the desktop interface as it currently exists, but I promised to make a positive case for my solutions. Because it would take at least a few weeks to put together a complete presentation, I thought it more timely if I instead present the ideas in installments (and hey, more reddit karma whoring this way). Most of the pushback to part 1 (both constructive and vitriolic) concerned my ideas about lists, so I decided to start there.

Rather than writing this up, I thought a screencast would be more appropriate for presenting these visual ideas. The run time is 37 minutes. (Yes, I know that’s long, and it starts slow, but if I’m not thorough, I just leave myself open to superficial dismissals.)

(I apologize for some parts where my narration gets a bit difficult to follow: I heavily edit the audio and end up removing awkward gaps and pasting together sentences from different takes; this works surprisingly well most of the time, but sometimes the result sounds a bit like Max Headroom.)

Learn to program in 20+ easy steps

22 Jul

As it stands, my comprehensive Introduction to Programming series is organized into parts as follows:

1) A first language

To program, you must learn a programming language, so we start by introducing a language called Pigeon. Pigeon is a language created expressly for students in that it features as simple a grammar as I believe possible while still reflecting the concepts found in “real” languages. Learning Pigeon first should make later tackling your first real language much easier.

2) Numbers

Computers are deeply mysterious until you understand how information is represented as bits. We start with numbers because their bit representation is used as a basis for representing other kinds of data.

3) Text

Text is represented using what are called character sets. The most commonly used character sets are ASCII and Unicode.

4) Images

In this part, we briefly discuss some elementary concepts of computer graphics.

5) Hardware and operating systems

Here we cover essential concepts in computer hardware and operating systems.

6) Languages and tools

There is a wide array of programming languages in existence. We’ll survey the most popular ones and discuss their major differences. We’ll also discuss the associated tools, e.g. debuggers.

7) The Javascript language

The popular language Javascript (not to be confused with Java, another popular language) is very close semantically to Pigeon, so it’s a natural choice for our first real language.

8) The Internet and the web

Here we’ll discuss the basic structure of the Internet and the protocols on which it runs.

10) Structured data formats

Structured data is data in which individual pieces of data (numbers, pieces of text, etc.) are related together in an organized way. A person, for instance, can be represented as structured data: a name (text), an age (number), an address (text), etc. We have standard formats for such data such as XML (Extensible Markup Language) and JSON (Javascript Object Notation), among others.

11) HTML and CSS

Webpages are documents comprised primarily of HTML (Hypertext Markup Language) and CSS (Cascading Style Sheets). We’ll also cover the role of Javascript in webpages.

12) The Unix command line shell

Before graphical user interfaces, computer users interacted with the computer using a command line shell—basically, an interactive programming language in which each command the user types is immediately executed. Today, shells are still powerful tools used by programmers and system administrators. In this unit, we’ll focus on the shell language used most commonly in Unix systems, BASH (the Bourne Again Shell). We’ll also discuss the command-line programs commonly available on Unix systems and discuss how these programs can be tied together through the shell.

13) Assembly language

Assembly language represents the lowest level of programming. Here we’ll cover assembly language for x86 processors using the NASM assembler.

14) The C language

The C programming language is one of the oldest and most influential languages still in use today. Unlike most other languages (including Pigeon and Javascript), C gives programmers a fine degree of control over the hardware, making it suitable for writing systems software (such as operating systems, like the Linux kernel) and for programs requiring high performance, such as the latest computer games.

15) Data structures and algorithms

There are only so many fundamental ways of organizing data. We’ll discuss these data structures and the algorithms associated with them.

16) Object-oriented programming

Object-oriented programming is a style of programming in which the programmer, rather than focusing on action,  focuses on establishing types of data and the actions associated with those types. This style is strongly encouraged in a number of so-called “object-oriented languages”, including Java.

17) The Java language

Since the late 1990′s, Java has been the most commonly used programming language. Java’s success has spawned an imitator from Microsoft called C# (“C sharp”), which differs in many details but is fundamentally similar. We cover Java instead of C# mainly because Java is somewhat simpler and still more popular.

18) Encryption, security, and compression

We’ll discuss the basics of encryption, security, and compression, which aren’t nearly as arcane as you might imagine.

19) Graphical interfaces

To write a program with a GUI (graphical user interface), a programmer uses a library called a GUI toolkit. We’ll focus on one such toolkit for Java called Swing.

20) Version control

When we write code, it’s very nice to be able to keep track of all of our changes such that we can always go back to an earlier version when we mess something up. It’s also really important to coordinate our changes with others working on the same code. For these reasons, programmers use programs called version control systems to manage their code. We’ll focus on two popular such programs, Subversion (abbreviated as “svn”) and Git.

21) Databases

A database is a specialized program for storing large amounts of data in a way that can be searched and retrieved efficiently. Databases are used everywhere: for instance, a popular website like Amazon.com uses databases to store product and customer information. The most commonly used databases are relational databases, meaning they structure data in the style of the relational model. The programs we write typically communicate with a relational database using a query language called SQL (pronounced “sequel”, Structured Query Language).

22) Regular expressions

Regular expressions (sometimes abbreviated as “regex”) are a sophisticated tool for finding patterns of characters in text. For instance, using a regular expression, I could easily remove from a text all instances of the word “curry” following the word “lemon”.

23) The Clojure language

Many regard Lisp as the most elegant of all languages, and Clojure is a particularly elegant recent variant of Lisp. Clojure gives us an opportunity to introduce functional programming, a style of programming in which we avoid “state change” as much as possible.

24) Automating the build process

The whole process of translating source code and data files into a working program is called the build process. In most software projects, we end up building the project many times over as we develop the code, fix bugs, and change features, so it makes sense that we automate this whole process as much as possible. In this part, we’ll discuss popular build tools, such as the Unix make program.

Reinventing the desktop (for real this time) – Part 1

20 Jul

Being of a presumptuous nature, I tend to get big ideas, and among those big ideas are notions of how to “reinvent the desktop”, notions which I call collectively Portals (a play on Windows).

Ain’t broke?

Before I explain Portals in detail, we should establish whether anything is really wrong at all with the modern desktop or if desktop “reinvention” is just a chimera of UI-novelty seekers. This is only prudent because, if we can’t clearly identify deficiencies of the status quo, we may fall into the trap of replacing the status quo with something not truly better, just arbitrarily different.

So let’s first consider what functionality comprises a GUI desktop. A desktop consists of:

  • An interface for starting applications, for switching between open applications, and for allotting screen space between open applications.
  • A common set of interface elements for applications, often including guidelines for the use thereof to achieve a cross-application standard look-and-feel.
  • A data-sharing mechanism between apps (copy and paste).
  • A common mechanism for application associations—what applications should be used to open such-and-such file or send a new email, etc.
  • A set of system-wide keys, e.g. ctrl+alt+delete on Windows.

And because most users don’t/can’t/won’t use a command-line, desktops include a minimum set of apps:

  • File management.
  • System configuration and utilities.
  • Program installation/removal.

Since the 1980′s, this functionality has been presented to users on most systems with only minor variations upon the standard WIMP (Window, Icons, Menu, Pointer) model handed down from Xerox PARC and the first Mac, so, obviously, the modern desktop is not really broken: people have been getting by with essentially the same design for decades now. Still, there is a perennial longing for something better, so the question is what motivates this feeling?

What’s wrong?

Scaling issues

A fundamental difference between the computing experience of 1984 and the computing experience of twenty-five years later is that users simply do a lot more with their computers: more diversity of tasks, more tasks at once, and a lot more data, both on the user’s local machine(s) and out there on the network. In particular, the window management and file management that made sense for 1984’s attention load just don’t hold up in an age of web distractions and half-terabyte hard drives.

Lack of sensory stimulation and tactile interaction

Only librarians want to live in a grey, motionless, silent world of text, but for a long time, that’s what the computing experience was. Then came icons and windows, and they could move! Quickly this novelty wore off, so today our menus slide, our workspaces spin in three dimensions, and our windows cross the event horizon every time we minimize them. And our iPhones fart.

Moreover, we increasingly expect interfaces to entertain our hands. Touch screens! Multi-touch! Surface top! Gestures! I’ll admit that these developments are exciting, but they’re exciting mainly because we don’t really know what will come of them—our hopes at this point remain still very vague. As clearly as we can define it, our hope is that computer interaction can be made satisfying in the same way that a good hit on a tennis ball is satisfying or in the same way that closing a well made car door is satisfying.

Sadly, these ideas may turn out to be like virtual reality: worlds of possibilities, none of the possibilities very useful. So we may be in just another cycle of the permutations of fashion. Still, aesthetics and feel really do matter to an extent, for a good layout of information and good use of typography tends to be aesthetically pleasing, and good tactile feel, such as proper mouse sensitivity, definitely facilitates usability.

We should acknowledge, though, that computing is no longer a dull, grey world anymore, mostly thanks to the web, not changes in the desktop. This suggests, then, that the best way forward for an aesthetically pleasing and stimulating desktop is to minimize the interface: the less screen real estate occupied by the interface’s “administrative debris”, the less there is that we need to make look good and therefore the less opportunity that we have to fail.

Administrative debris

Edward Tufte coined administrative debris to denote all of the elements of a UI not directly conveying the information the user really cares about. For instance, the menus and toolbars of most apps are almost entirely administrative debris. Such debris is problematic because:

  • Debris takes up precious screen real estate, which would be better used to present information.
  • Debris distracts the user.
  • Debris requires the user to learn its layout and how to navigate in and around it.
  • Debris is aesthetically displeasing and intimidating because it suggests complexity, both in terms of information clutter and conceptual difficulties.
  • Debris often has to be managed by the user, thereby creating more “meta work”.

Meta work

Meta work is any work which the interface burdens upon the user in addition to the user’s actual work. Meta work is terribly displeasing, the mental equivalent of janitorial work.

Some meta work is hard to imagine getting rid of, such as scrolling through a list of information, for if we really intend to present more information than fits on screen, the user must scroll or page through it somehow. Most interface meta work, however, comes from two sources:

  • Positioning things and navigating. In particular, moving and resizing windows and navigating through menus and dialogs. This also includes any kind of collapasble or adjustable information display. I find file browsers, for instance, to require constant adjustment because the directory tree view and the columns of the grid view are half the time either too wide or too narrow.
  • Debris. When the debris can’t all fit on screen at once, we require mechanisms for the user to manage the debris. The Office 12 ribbon, for instance, requires the user to manage which strip of controls he is viewing at any moment.

Most disconcerting, meta work perniciously tends to beget more meta work because the mechanisms introduced to manage information and controls often themselves take up space and require management.

Indirectness

Interactions with information through debris are indirect, so Tufte’s general prescription for minimizing administrative debris and meta work is to make interactions with information direct. For instance, rather than editing properties in a dialog, users should directly edit those values in some screen element directly attached to the affected object or, ideally, directly edit the object itself.

Direct interactions also have the virtue of being generally more obvious how to do than indirect interactions. On the other hand, most users aren’t familiar with direct interactions as a convention, so it may not occur to users to try them.

Hierarchies

Because we must hide a lot of things for the sake of limited screen space, a lot of information and administrative debris gets buried into hierarchical trees, meaning users end up spending a lot of time and mental energy navigating (which is really just another kind of meta work). For instance, to change my mouse settings in Windows, I follow the chain Start->Control Panel->Mouse. Or, say, to open a file, I must recall its drive, its directory path, and then finally its name. This hierarchical recall—and the ensuing navigation action—is mentally taxing and error prone.

The usual justification for using a tree is to avoid stuffing everything into one big flat list, but this is generally a misguided tradeoff. Consider a typical hierarchical menu, first in the usual pull-down/pop-out configuration, second in one big scrolling list with divisors between sections. Which is easier to learn? Which is easier to explore? Which is easier for recall? I believe you’ll find the flat list is better on all measures but perhaps one: a long list may be a bit intimidating on first glance compared to a hierarchy that hides the items in submenus by category.

(Actually, the flat list may be better even on this count because a menu which hides complexity is daunting in its own way: the user browsing such a menu quickly finds lots of complexity which they’ll have to recall how to find again later. Besides, the “first contact” shock of a long list can be mitigated with visual design that appropriately emphasizes the right elements. So flat lists arguably win on all counts.)

Now consider file hierarchies. Rather than having to remember that your Twin Peaks / Doctor Who crossover fan fiction is stored as e:/fanfic/twinpeaks_doctorwho.txt, it would be far better if you could just textually filter down by a query for twin peaks who or any other query terms that occur to you by free association. In fact, it would be nice when creating the file if you didn’t have to decide between twinpeaks_doctorwho.txt and doctorwho_twinpeaks.txt and didn’t have to decide whether to place this file in fanfic or some other directory. The lesson here is that:

  1. Hierarchical recall is mentally taxing and error prone. What we really want is free-associative recall.
  2. Hierarchical naming and placement are mentally taxing and error prone. What we really want are tagging and full-text search.

(See Clay Shirky on hierarchy.)

Frustrating discovery and recall

Perhaps the biggest frustration in using software is knowing what you want the software to do and knowing that your software can do it but not being able to figure out how to get the software to do it. These frustrations typically stem from an an inability to guess what the developers decided to name a feature and where the developers decided to place the feature in a hierarchical menu or dialog chain. For instance, the user looking for a program’s options dialog has to guess whether to look for File->Preferences, Edit->Preferences, Edit->Options, Help->Options, Tools->Options, or some other path.

The general solution here is, again, a big, flat list filtered by textual query. Like disambiguation pages and redirection in Wikipedia, a single item should be associated with any synonyms so that users need not recall the single precise name favored by the developers, e.g. preferences should show up in a query for options and settings.

Redundancy

Thinking up features is easy, but thinking up features that obviate other features is hard. Moreover, once a feature is added to a program, it takes a lot of political will to remove it. Consequently, many interfaces are laden with redundancy.

A degree of redundancy often serves a legitimate purpose, for many tasks should be equally doable by either keyboard or mouse, and common tasks often warrant shortcuts that make up in convenience what they lack in discoverability. In many cases, though, designers have simply let redundancy proliferate unchecked. A typical Windows application, for example, presents the user with at least four ways of closing the application using the mouse:

  • Via the X in the top right.
  • Via the right-click menu of the window on the taskbar.
  • Via the icon menu in the top left.
  • Via the menubar.

Additionally, users can close an application using the keyboard:

  • Via alt+F4.
  • Via ctrl+w
  • Via accelerator keys for the icon menu.
  • Via accelerator keys for the menubar.

That makes at least eight ways to close an application. This particular case of redundancy is maybe not so bad because most users have a favored method which they use by reflex, but the redundancy still clutters the interface, not just in screen space but in documentation space and mental space.

At its worst, redundancy isn’t just clutter, it’s more meta work heaped upon the user. Not only are such choices more management work, the bother of having to make these choices often lingers on the user’s mind. As Barry Schwartz discusses in The Paradox of Choice, choices are often a hidden source of unhappiness: when presented with a choice, people fret because they want to believe that the choice has a correct answer, even when none exists and even when the disparity of outcomes is inconsequential.

Most choices in interfaces impose very small burdens individually, but together they add up, and too often, designers underestimate this burden of choice. When users are making little choices optimizing for the best way to do something, it’s quite likely that the interface should be making these choices for them.

Thwarted reflexes

The opposite of making a choice is to act upon reflex. Enabling good reflexes and consistently rewarding them gives users a very satisfying feeling of control.

Ideally, a good reflex action should be context-free, meaning it shouldn’t require a particular desktop or application state. For instance, alt+tab is a desktop-level reflex that is supposed to work in all contexts such that, at any time, the user can hit alt+tab to get back to the window that last had their focus. Unfortunately, this reflex doesn’t work in some contexts, such as in some fullscreen games that either don’t respond to this command or only do so very slowly. Another aggravating example is Flash in the browser, which often steals keyboard focus and thus blocks the alt+d, ctrl+k, and ctrl+t commands.

Some reflexes, though, users pick up like bad habits. In Windows, I’m in the reflexive habit of hitting windows+e every time I wish to browse to a folder even if I already have that folder open as a window, thereby creating more meta work for myself in the form of another folder to close. A better designed reflex action would get me to my desired folder while somehow avoiding this duplication, for well-designed reflex actions don’t lead users down the wrong path.

Virtuality

Because hierarchies suck, designers frequently provide shortcut paths to various nodes in hierarchies. For instance, file dialogs in Windows Vista provide shortcut buttons to standard directories like Documents and Pictures. Or, for example, the display settings in Windows can be accessed via right-clicking the desktop rather than going into the Control Panel, but both paths take you to the same dialog.

The problem is that this virtuality not only introduces redundancy, it presents an inconsistent and disorienting picture to users and burdens them with more arbitrary crap to remember. Virtuality makes hierarchies more confusing, not less, because the same “shape” is presented in many different alternate forms, obscuring the “true” shape and thereby hindering discovery and spatial recall. Furthermore, when the user can’t picture at least the outline shape of the possibilities open to her, she feels surrounded by hidden pitfalls and paralyzed by choice.

Textual search is technically a virtual kind of access, but it doesn’t share these problems. If I access my Doctor Who / Twin Peaks crossover fan fiction by searching for who peaks, this isn’t another bit of arbitrariness for me to have to recall later, it’s just the set of terms that occurred to me at the moment by free association.

Burdened and stolen focus and attention

There’s a word for a person who repeatedly calls your name and taps you on the shoulder: annoying. We also have a word for someone who tries to hand you something when your hands are full already: asshole. So it’s not surprising that the most commonly cited interface annoyances are those obnoxious little pop-up windows that demand your attention and steal your keyboard focus.

Obviously, having your attention actively stolen is bad. Less obviously, meta work in all forms steals attention, but usually passively and in small chunks: after all, attention focused on meta work is attention taken away from actual work.

If there’s something many people feel increasingly short on in the networked world, it’s attention. A well-designed interface enables the user to focus on their own actual work, switching between tasks with little friction.

All your conventions suck

Now let’s get into some concrete criticisms of actual mechanisms commonly used today:

Icons suck

To a large extent, icons exist just as an excuse for designers to introduce eye candy, but the usual justification designers give for using icons is the truism that ‘simply having users point at the very thing they want is the simplest and most intuitive kind of selection.’ This is misguided:

  • Pictographs do not scale as well as text because you can’t alphabetize or do searches on images.
  • As you add more and more icons, the visual distinctiveness of each icon quickly gets murky and ambiguous.
  • Icons are generally not “the very thing” that users are looking for. A pictograph typically provides hints about the thing it represents but is not synonymous with the thing itself.
  • Worst of all, interpreting pictographs is more mentally taxing than reading a word or two, especially when the semantic content is even mildly abstract.

The crux here is that it is far easier for people to recall the general qualities of a picture—its dominant colors and overall shape—than it is to recall its precise details. Also, compared to abstract images, images of recognizable objects are much easier to recall details of because we can mentally fill in the blank spots with our assumptions of what such objects look like. For instance, if shown a picture of a car, a viewer immediately discerns the notion of a car, not because the viewer quickly absorbs all the visual detail but because she immediately registers a few key details and then her mind fills in the missing pieces. This explains why most icons in software are so bad: most icons found in software are small, indiscernible messes, so users fail to recognize what the icons depict and learn to think of them as abstract shapes.

Now suppose I know what I want my software to do but don’t remember at all how the interface designers decided to label that function with text or an icon. If I’m looking for a label, I have to figure out what words the designers chose to describe it, which often requires consulting my mental thesaurus. In contrast, if I’m looking for an icon, I have to figure out what words the designers chose to describe the feature and then figure out how the designers chose to represent those words as an image. While the number of synonyms for a particular concept can be frustratingly many and elusive, the number of visual representations for a concept are innumerable: even if you narrow down the concrete object(s) being depicted, there are still the variables of perspective, composition, style, and color.* Moreover, users can always fall back on actually reading a list of words till they find a likely match; this is reasonably doable, in contrast to “reading” a list of icons, which is painful and slow.

* (Sure many real-life objects only come in one color, but many don’t. In fact, looking over the icons in a few applications, I notice that a strong majority have basically random color assignments, either because of the nature of what they depict or because of the need to make them stand in contrast to their neighbors.)

To the extent you do use icons, follow these guidelines:

  1. All but the most frequently encountered icons should be labeled by text. Many applications omit text labels because small, unlabeled icons allow for buttons that minimize space use (see Photoshop). This is a poor trade off. First of all for the sake of image recall outlined above, but also because even the best designed icons rarely communicate their function as clearly as a word or two of text. In fact, the real virtue of icons is that their shape and color make them noticeable to peripheral vision or visual scanning, so they help users find points of focus and do an initial culling of their possible options. After that initial culling stage, however, users have only narrowed their options and so prefer the relative precision of words to help them make their final selection.
  2. Icons should be simple in shape, distinct in silhouette, have contrasting interior lines, and almost never use more than two dominant colors.
  3. Icons should be as big as necessary to make them conform to rule 2.
  4. The number of icons that it is acceptable to use is proportional to how large and distinct they are, vis-a-vis rules 2 and 3. The array of icons found in today’s typical complex apps, like word processors and Photoshop, is too many by a factor of about three.

Icon view sucks

Compared to the detailed-list view of files, the icon view is a paragon of form over function. Not only should icon view not be the default folder view, icon view should not exist. It’s flat out stupid. Not only is the browse-ability of a list in one dimension far superior to a list in two dimensions, a two-dimensional listing must be rearranged when the view width changes, meaning icons end up changing their horizontal positions, thereby disorienting the user and thwarting his spatial recall.

(A thumbnail view of pictures is a special exception to this rule.)

Thumbnail previews suck

Continuing with the theme of pictures being a false cure-all, thumbnail previews of windows and tabs rarely justify their use:

  • First, most such previews are triggered by a delayed reaction to a mouse hover, which tends to mean they pop up too soon one half the time and too slow the other half.
  • Second, even with great anti-aliasing, a two or three square inch representation of a full window or tab is often just too small to make out clearly.
  • Third, most documents and tabs are comprised mainly of text and so very often look pretty much the same, especially when shrunk down to a small preview.
  • Fourth, the user may expect to see one portion in the scroll of a document and so not quickly recognize the document if another portion is shown in the preview.

For previews to be worth the mental burden, they need to be instant and large, perhaps even full-sized.

Animations suck

Currently, much work is going into GUI toolkits to make it easy to add UI animations, such as having elements that slide around. The inevitable problem with animations, though, is that they introduce action delays and so must be kept very short, and yet the shorter the animation, the more the animation defeats its original intent, which is to convey to users where elements go to and come from. (See Philip Haine’s critique of Apple FrontRow)

Settings management sucks

Desktop settings management exhibits virtuality gone mad. On the one hand, Windows has Control Panel and Gnome has a Settings menu—central places to do configuration—but centrality is deemed too inconvenient for some cases, so we sprinkle special access mechanisms ad hoc throughout the desktop. In Windows 7, for instance, the start menu includes both Control Panel and Devices and Printers even though Devices and Printers is just an item in the Control Panel. Or, for instance, the Network and Sharing Center is an item in the Control Panel, but it’s also accessible via Network in the left panel of the file browser. Worse, some settings are not found in the Control Panel at all, e.g. folder options are in Tools–>Folder Options of the file browser but not in the Control Panel. Most ridiculous and aggravating, though, is how these ad hocisms change with each release such that the user’s hard-learned arbitrary nonsense becomes useless. In the end, the path to every setting becomes an ad hoc incantation, a little piece of version-specific arcana to document in user manuals with a dozen screen shots.

The Desktop itself sucks

Interface design is largely about rationing precious screen real estate, and…

…hey, everyone! Here’s this big blank surface going unused! Let’s give it a random assortment of redundant functionality to make up for the inadequacy of our main controls! Sure, the start menu already has a frequently-used program list, but it’s too orderly. And users already have a home directory, but they can’t see its contents at the random moments that their un-maximized windows are positioned just so. Users love messes! Hmm, now we just need umpteen different special mechanisms for hiding all these windows that obscure this precious space.

*Ahem*…yeah. Put another way:

  • The desktop creates clutter by encouraging people to use it as a dumping ground for files.
  • The desktop contains ‘My Computer’ but itself is contained by ‘My Computer’. Well done, Microsoft, for helping make the concept of files and directories clear, and so much for the metaphor of files as physical objects (which isn’t a good metaphor to begin with, but if you’re trying to go with a metaphor, stick with it).
  • The desktop as a working surface necessitates mechanisms to get at it easily from behind all of these damn windows.
  • The desktop compensates for inadequacies of the start menu and file browser by duplicating some of their functionality, so users are presented with the silly choice of whether to put an application shortcut or file on their desktop and/or in the start-menu/dock, and then later they have to remember where they put it and possibly make an arbitrary choice of which to use.

Menu bars suck

The drop-down, pop-out style of menus found in application menu bars are optimized for minimal obtrusiveness (both in terms of visible space and visibility time) and for minimal mousing (both in terms of motion and clicking). Unfortunately, these optimizations are ultimately inadequate:

  • First, as most applications have conceded, users simply don’t like using the menu bar for frequent accesses, so applications add redundant shortcuts, such as toolbars, for frequently used items.
  • Second, many users find mousing through these menus frustrating despite refined mousing affordances.
  • Third, these standard menus have an artificially limited vocabulary—both visual and functional (e.g. sliders and textfields can’t be menu items*)—so all but the simplest features get shunted into pop-up dialogs.

* (Clicking an item is supposed to dismiss the menu overlay every time, which wouldn’t work for textfields or sliders as items.)

Worst of all, menu bars are not only hierarchical, they present their hierarchy confusingly: their various menus and submenus overlap and flash in and out as the user mouses, and because floating dialogs are untethered from the items which open them, users quickly forget how to get back to dialogs.

Context menus suck

Pop-up context menus suffer most of the same ills as menu bars, and they introduce redundancy. In Firefox, for example, the context menu of the page includes back, forward, reload, stop, and several other items also found in the menu bar.

On the plus side, a context menu doesn’t suffer from the same hierarchical recall problems as menu bars (unless the context menu includes many submenus). However, each context menu effectively presents a virtual view into the menu bar: the menu bar is where all my controls live, but right-clicking different things shows me different mixes of those controls, and sometimes it even shows me things not in the menu bar. This virtuality is bad for all the reasons discussed above.

Dialogs suck

Developers love dialogs because dialogs allow developers to avoid hard decisions of positioning and sizing. Don’t know where to place a feature? When in doubt, stuff it into a dialog.

Yet most users hate dialogs:

  • First, navigating to dialogs is often a frustrating discovery, recall, and mousing process.
  • Second, dialogs not only steal focus, they often block interactions with their parent windows.
  • Third, dialogs have a tendency to get lost behind other windows because they’re generally small and don’t show up in the taskbar list.
  • Fourth, it’s often unclear how users should close a dialog. For instance, clicking X in the top-right is sometimes effectively the same as clicking cancel but sometimes effectively the same as clicking OK.

If there’s anything worse than a dialog, it’s a dialog spawned from another dialog. Thankfully, most of today’s applications have learned to avoid that particular sin.

Toolbars suck

Application developers resort to redundantly placing menu bar items in toolbars mainly because menu bars suck. The redundancy this introduces is aggravating enough, but on top of this, toolbars usually consist mainly of icons (which, recall, also suck), and just like menu bars, most toolbars artificially restrict themselves to simple buttons and thereby end up punting complexity into dialogs. Triple suck score.

In simple applications, like web browsers, the redundancy is not so bad, but as applications get more complex, the number of convenience icons tends to grow (think Word or Photoshop) until the redundancy becomes a nuisance to both newbie users and experienced users alike: newbies find the preponderance of overlapping choices confusing and distracting; experienced users find repeatedly making the arbitrary choice of whether to look in the menu bar or toolbars bothersome and distracting.

The taskbar sucks

Like the web browser tab bar, the taskbar suffers from an intractable dilemma: in the horizontal configuration, it scales poorly past more than 7-9 items; in the vertical configuration, more items fit naturally, but each item has less space for its title unless you’re willing to make the bar a few hundred pixels wide. Widescreen monitors alleviate the space problem in both configurations, but not sufficiently to dissolve the problem.

The start menu sucks

Since Windows 95, the start menu has been arranged in a hierarchy of aggravating pull-out menus, with each program typically getting its own folder. Vista has sensibly moved towards textual query over a flat list, but the flat list is only flat-ish because folders remain. Not only do the folders mean that most items in the list have unhelpfully identical folder icons, virtually all folders have no reason for being: I don’t need a folder that contains X and Uninstall X, for if I want to uninstall X, I’ll use Programs and Features in the Control Panel like I’m supposed to; if a folder contains items other than the program itself, they can simply be their own standalone items or can simply be moved into the application menu or application splash dialog (World of Warcraft does this).

So if I had control of the Windows 7 start menu, I would simply:

  • Put every item in one big scroll such that you get rid of All Programs.
  • Get rid of folders.
  • Add section dividers.
  • Make the whole menu taller, if not the whole height of the screen, and make the program list section wider so that long names are more presentable.
  • Put the items in the right-side of the menu into the left or simply get rid of them, e.g. Shut Down and Control Panel get put in the program list. (If users really need to access these features so quickly—which I don’t think is the case—just add shortcut keys.)

You might object that getting rid of categorical hierarchy means programs can’t be browsed by type, but this is not really the case. First, programs should be arranged into appropriate sections with titles. Second, when menu items are textually filtered, they can be filtered on tags as well as names, e.g. filtering on game should show any game program whether or not it’s in the section games or has game in its title.

Application windows suck

The primary reason to put applications in free-floating windows is so that users will be able to put applications side-by-side, even though doing so is, in truth, at best a niche use case. The problem is that positioning and sizing windows takes a lot of bothersome meta work, especially when maximizing a window’s space usage.

Furthermore, window overlap requires the user to make annoying random choices of how to get at a particular window. Shall the user move or minimize other windows to get at the window underneath? Or should the user alt-tab directly to the window? Or use the taskbar/dock?

In the end, windows burden users with meta work and unnecessary choices for virtually no real benefit. Of course we should have the capability to see applications side-by-side, but we shouldn’t build the whole desktop around the idea.

Drag-and-drop sucks

For drag-and-drop to work efficiently, the drag source and drop target must be in view, but this is very rarely the case without burdensome pre-planning on the user’s part, especially when dragging from one application to another. Nearly as bad, users often mess up drags because drop targets are often unclear or finicky, resulting in unintended actions that must be undone. Users also sometimes simply change their mind mid-drag but are given no obvious way to safely abort the action. Finally, drag-and-drop actions are often poorly discoverable. In iTunes, for instance, the only way to move individual tracks to a device is by drag-and-drop, which many users fail to figure out on their own.

Virtual desktops suck

Floating application windows suck, hierarchies suck, and the desktop itself sucks, ergo virtual desktops suck. (And note how virtual desktops make drag-and-drop suck even more than it already does.)

Gadgets/Widgets/Gizmos/Plazmoids/Desklets/Applets all suck

Application windows suck and the desktop itself sucks, but applets are fucking ridiculous.

OK, I’ll walk that back a bit. Little status/info panel thingies? Fine, but let’s neatly organize them into some proper window rather than dump them onto the desktop surface (which, recall, needs to die).

If an applet is something the user actually interacts with at length, such as a game, there’s no reason whatsoever not to make it a proper application.

Wrong track

Before finally laying out Portals, let’s examine the good and bad interface reform ideas currently in circulation. First, the bad ideas follow four general themes:

Eye candy

Elitism is an essential part of human aesthetics. For instance, while we normally think of the criteria that make a good-looking person good-looking as objective, much of the attraction towards that person hinges on the rarity of their looks, not the looks themselves, per se. Similarly, gold is shiny, but an essential part of its worth is its rarity.

We see this in graphic design as well: what we consider stylish design hinges a lot on what is simply hard to duplicate. In the 60’s, this meant curved plastic furniture; in the 80’s, this meant cheesy computer video effects; today, this means web pages with rounded corners and glossy effects.

On the desktop, today, elite style means using hardware graphics acceleration because, five years ago, no desktop had it. As it stands right now, none of the major desktops have totally sorted out the infrastructure to make acceleration work ubiquitously, nor has the software caught up to make use of the new toy.

The trouble is that the set of new possibilities which acceleration opens up includes a lot of distracting, silly ideas which actually detract from usability. The obvious example of falling into this trap is Compiz and similar projects. Even aside from the purely aesthetic toys in these projects (such as drawing flames on the desktop), many of the features clearly exist purely for the sake of ooh…shiny.

Virtual physicality

Graphics acceleration has also led designers to create physical-simulation abominations like 3D desktops. Examples include:

This review of Real Desktop sums up the problem:

We can’t count the number of times we wished our Windows desktop was as messy as a regular desk. You know, because we’ve never really wished for that. But that’s exactly what Real Desktop lets you do. Oh yeah, it also turns your desktop into a 3D workspace.

While the 3D desktop is certainly pretty, we’re not sure it’s particularly useful. You can move icons around the screen with a left click. Click both of your mouse buttons to “pick up” an icon, or click the edge to rotate it. Probably the most fun you can have is when you highlight a bunch of icons and then drag them into another group of icons and watch them scatter like bowling pins.

Of these desktops, Grape is the least offensive because it mainly sticks to two dimensions, but it still exhibits everything bad about icons and drag-and-drop and imposes a heap of meta work upon the user in the form of innumerable icons, boxes, and text labels to create, position, and manage.

After a little thought and experimentation, it should be evident that treating virtual things as if they are like physical things is satisfying only up to the point where it becomes maddening, for the physical world simply does not scale the way the virtual world can. Sure, these desktops look neat and manageable when you have a couple dozen files, but who has just a couple dozen files anymore?

Manual, transitory organization

When people work in a physical space, they develop organization habits and strategies to cope with the mess of things before them. On your desk, for example, you might keep your personal stuff segregated from your business stuff, which makes sense because, as you work in one domain, you don’t want interference from another domain.

In the virtual world, however, such interference is not a problem: if I don’t have personal documents open at the moment, they don’t in any sense get in the way of the business documents I’m working on. If I do have a personal document open, presumably it’s because I’m switching my attention back and forth to that document. If I were to segregate my current items of attention, I wouldn’t solve the problem that I simply have only one focus of attention to give.

Interfaces that allow users to group or order items for the sake of coping with their number are imposing meta work on the user. Worse, grouping introduces hierarchy such that, to select an item, the user first must recall what group it’s in.

These burdens on the user often make sense when the user is organizing persistent state (e.g. files), but not transitory state. So, for instance, users shouldn’t order their browser tabs and group them into separate browser windows. Rather, the interface should automatically help users cope with dozens of open tabs in a way that obviates this manual work.

Half of the new interface design proposals I see assume that users would like doing manual, transitory organization, I think because the idea seems like it reflects the “natural” way people think and work. This probably stems from a sort of grass-is-greener fallacy: having worked on computers for so long, people begin to feel they’ve lost the virtues of physical paper work, forgetting why they moved away from paper in the first place.

Special pleading

In many desktop and web browser proposals, certain often-used applications and often-used sites are given special priority, usually in the form of convenient-access mechanisms. For instance, a number of design proposals for GNOME and netbook Linuxes elevate personal contacts—IM, email, address book, etc.—to first-level status on par with applications and file directories. Such proposals may have a proper motivation, for perhaps our current general mechanisms really don’t suit a particular common task or workflow. However, we should always try to rethink our general mechanisms before introducing special cases. For one thing, special exceptions tend to please one set of users to the great annoyance of others. For another, each exception is a design complication that all users must learn (or at least learn to ignore) and which inevitably becomes a barrier to change.

Steal from the best

Despite what the previous six-thousand words might convey, I don’t actually hate everything. In fact, Portals largely synthesizes a number of ideas from existing stuff, the most notable being:

  • The Firefox AwesomeBar
  • Quicksilver/Enso/Ubiquity
  • Wikipedia, Google, and various other sites

The things these examples do right fall under a few general themes:

  • Responsive, text-based navigation and action (e.g. search, text links, and commands)
  • Tags, not hierarchies
  • Lists sorted by recency and frequency
  • Chrome-minimal design
  • Typography-focused design

Having already trashed the alternatives, I won’t give these ideas detailed justifications, but “typography-focused” requires some explanation:

Whether you like the term Web 2.0 or not, we definitely did see a quiet revolution in web design somewhere around 2002. This new style is associated superficially with rounded corners and shiny gloss, but there’s more substance to it.

In the web’s first decade, designers strove to imitate magazine layout, wherein eye candy is stuffed into an asymmetric grid of boxes surrounded by cluttered, omnipresent headers and navbars. This style was motivated mainly by:

  • An aversion to simple flow layouts. No self-respecting designer wants their stuff to look like a Geocities page. By fighting the natural bias of HTML/CSS for flow layout, you get a look that’s hard to reproduce and therefore “professional”.
  • An inability to decide what’s really important. Business people in particular have a hard time coming to terms with that fact that, for some things to stand out, other things must be deemphasized. Of course you want visitors to partake of all your wares, but what do visitors want?

Today, good web design is typified by generously spaced and well-formatted text in one, two, or, occasionally, three columns that are allowed to flow down the page rather than divided into unnecessary widget boxes. Some good examples are:

To be clear, “typography-focused” doesn’t always mean ditching images and widgetry in favor of more text. Take for example Amazon.com, which is not an exemplar of the new style but exhibits subtle improvements when you compare Amazon.com of 2009 to Amazon.com of 2000. Like many shopping and portal sites, Amazon still retains much of a cluttered magazine layout, but you can see how the site today better uses images, colors, boxes, and spacing to avoid a ‘mass-of-text’ look.

The point here is that typography is about the complete presentation of the text—its context—not just the text itself. When text is presented well, you can do more with it, as many web designs in this decade have shown.

First principles

Before finally getting into the actual design of Portals, I’ll summarize the design philosophy in four slogans:

Don’t make me think

The title of Steve Krug’s book, Don’t Make Me Think, works as a great design mantra because it succinctly states that:

  1. The most important thing in interface design is the user’s thought process.
  2. Users would rather not have a thought process.

Obvious, perhaps, but easy to lose sight of when caught up in design details.

The explanation is the design

Is your design hard for users to understand? Does user proficiency hinge upon hours of practice and study? The best way to answer these questions is to start by writing the manual. Sometimes this will lead to changes in design, but often all that’s required are some changes in wording or terminology. In any case, your first concern should be how to explain the design to its users, not other designers and programmers.

The right features and only the right features

As I stated above in passing, it’s easy to devise new features, but it’s hard to devise features that make other features unnecessary.

It’s not worth it

Lastly, ‘it’s not worth it‘ is a handy, all-purpose way for me to shout down anything I don’t like:

Me: It’s not worth it!
You: What’s not worth it?
Me: It!

But the mantra has a non-abusive purpose as well. Ask yourself, say, ‘Why have we stuck with menu bars for so long?’ Well, when anyone argues that menu bars suck, the perfectly correct reply comes back that a menu bar is the optimal way to minimize mousing over an hierarchy of things. The problem is that this argument hinges upon the hidden assumption that efficiently mousing over hierarchies is of primary importance. Such hidden assumptions are the “it” to which I refer. What you think is so important perhaps isn’t.

‘It’s not worth it’ also works for cases where users themselves lose track of what’s really important. For instance, I advocate getting rid of the desktop surface, but I just know some people will object. ‘Users love wallpapers,’ they’ll say, never mind that wallpapers exist solely to (literally) paper over an unnecessary problem. The proper reply here is that good design requires balancing users’ desires to give them what they really want, and sometimes that means disregarding some desires for the sake of others.

Continued in part 2.

Odds

4 Jul

Did Kristol ghostwrite Palin’s speech?:

The odds are against her pulling it off. But I wouldn’t bet against it.

That’s some shrewd betting strategy, betting on the outcome you yourself find least likely.