
I recently read that Wikipedia is planning to pay people to make illustrations for some articles. To justify paying for graphics but not article text, the interviewee claimed that “volunteers apparently don’t find it rewarding” to make illustrations.

But I ask: is this because of some inherent property of illustrating (as she seems to be implying), or is it because no good tools currently exist for collaborative, online image editing? If it were as easy to collaboratively make illustrations as it is to write wiki text, my guess is that lots of people would do it for free. It sure sounds fun to me!

Two of my colleagues at MIT are working on separate research projects which I think could greatly contribute to making such a tool practical. But I can’t disclose those projects here without their permission.

Believe in what you’re doing

My friend “emax” had some words of wisdom to share today:

…But the thing that really made Warhol awesome was that he /was/
rejected. If the MoMA had taken his Shoe painting, adopted it into
their collection (and started selling postcards and scarves with his
design), he probably would have done something drastic, like
committed suicide or something. Or at least done something very
different. Being rejected gave him a very reason to exist; other
people didn’t get it, in the sense that they didn’t understand him or
why he was doing it. And that if anything was the biggest reason he
had to do it.

And he did do it with great intensity. Andy and his “Factory” worked
like mad, refining their image and experimenting with various media
to see what would be the most “fun”. indeed they had a closing
window of opportunity as the factory quickly gained visibility in the
city — especially tahnks to andy’s antics and appearances. And if
the public knew, the MoMA was soon to follow. Then it would all be

so moral of this short story . how to do interaction design like Andy:

– if you’re doing something that gets immediately accepted, it’s not
that exciting. you might want to consider doing something different.

– if you’re on to something really far out there, it will almost
surely not be appreciated immediately. do you believe in it? If
not, do something else. But if you DO, you have a reason to exist!
hooray! now you have some time to execute it. But not long…

So… stick with what you believe, and work hard and fast before the
world catches up. And form a close knit group that you can use to
pierce through the glass shell of the present into the future!

I think there’s a lot of truth there. Most importantly: Believe in what you’re doing.

Interface Design Principles

I want to start compiling a list of user interface design principles. Not necessarily things that I’ve read, but things that I’ve gleaned from working on Graph Sketcher and looking at other UI design work. Also, these include the business side of product design.

1. Make your interface easier/faster than what users can do with their current tools. It’s surprisingly difficult to beat paper and pencil.

2. Constraints: understand what your users do and do not need to edit. Just because the underlying representation is very flexible does not mean the user need be exposed to that full power – the complexities and options could slow them down considerably. Whenever possible, make design choices so your users don’t have to.

3. As I think I detailed in a previous post, good design means inventing good abstractions. Good abstractions ideally have all – and only – the functionality that your users need.

That’s all for the moment.

Features and Wikis

It’s much easier to add new features to programming languages and command-line interfaces than it is to add new features to graphical user interfaces.

I think that is profound, but I’ll have to muse it over to be sure.

For example, to add a new editing command (say, superscript) to emacs, just choose a name or keyboard shortcut for the command. But to add it to a graphical text editor, you need a new button or menu item. It has to be worked into the layout of the graphical interface. Other interface components might have to get moved or kicked out. If you’re Microsoft, you just add it onto the end of a quickly growing toolbar or menu or preference window.

Maybe this helps to explain why wikis are still based on wiki-text rather than WYSIWYG editing. If you want to add a new feature to a wiki (say, superscript), just define some new language tag for the wiki-text, e.g.: ““. But if you want to add that to a WYSIWYG wiki (let’s call it a wizziwiki), you have to design. Put the button somewhere. Decide which features are most important. This is hard! And it is especially hard for open source communities where a vocal minority will get upset at the removal of a feature that they particularly like (or programmed). That’s probably part of why brand new open source projects have to continually get started.

Here’s one solution. A wiki which lets you do the most basic editing (bold, italic, lists, links) as WYSIWYG, but has an “advanced editing” mode which is text-based and allows all those other features. The problem with adding this to current wikis lies in getting a “handle” on the text entry point; that is, mapping back from the graphics into the source. Right now it’s a one-way street: wiki-text to html/DOM. It’s hard to go in the other direction. (The same is true for LaTex.) So maybe we need a completely html-based wiki. Is someone working on this?

Games and Decisions

I read somewhere (maybe Everything is Miscellaneous?) that the concept of “game” is hard to define; we have prototypical examples of games in our minds – chess, soccer, solitaire – but it’s difficult to draw boundaries based on attributes (does solitaire have any of the same features as improv games?). The point was that we categorize based on prototypical examples, not by inherent properties of the category.

But I’m going to go out on a limb and posit that games are defined by a player making choices. In chess or solitaire, choosing your moves; in soccer, precise motor sequences to maneuver the ball. There are certainly activities that fall outside this boundary – passive activities like reading or sitting in lecture. But it seems like every activity involving at least one active participant has been referred to as a game – from soccer up through stock trading, having a conversation, or living life!

What makes prototypical games more “gamey” may be that they don’t have significant consequences in the “real world”. They allow us to practice our decision-making skills in a safe environment, so that we can see the outcomes of a decision without having to worry too much about it – and thus, learn. By “decision-making”, I mean everything from instinctive muscle reactions to complex, deliberated thought processes.

What does this mean for interface design? Games are fun. If what I am claiming is correct, games are all about if-then scenarios. Interfaces involve the user making decisions in the form of input. In order to complete the if-then, so the user can see the outcomes of their decisions, we should design interfaces that respond to input with feedback about the outcome. As soon as a decision is made, the consequences should be shown. This is what I call “user-generated animation” and it’s all over Graph Sketcher and Mac OS X in general. Basic drag and drop. Colors and widths changing as you drag the sliders. The dock resizing as you run your mouse along. The new album “skimming” feature in iPhoto. Bill Moggridge was on to something when he encouraged us to draw inspiration from games.

Going even further, maybe this helps explain my dislike of both games and decision-making (compared to the average person). I’m most comfortable making decisions in a very emotionally safe context; and most games and aspects of life do not fall under that category. I tend to approach decision-making as a chore, in which case games are not fun unless they have some other outcome such as getting to know the other players or learning some useful skill.

Design as Abstraction

I am going to posit that good design means coming up with good abstractions.

What do I mean by abstractions?
Anything that takes you up a level of abstraction so you don’t have to worry about the underlying details anymore. For example, WYSIWYG interfaces abstract away the fact that the computer is really manipulating some sort of data store. Data stores themselves abstract away the details of memory mangagement and disk access. Physical inventions like glue and paint abstract away from the chemical underpinnings; electrical tools abstract from their electrical basis. Rice cookers abstract away from the details of stove temperature and cooking time.

What makes a good abstraction?

  1. Good abstractions are powerful, in the sense that they let you do a lot with the very same tool. This is what Will (the designer of many of the Sim games) said.
  2. And good abstractions are not “leaky” – they do not force the user to resort to, or even understand, the underlying details that the abstraction is supposed to cover up. This is what Joel (On Software) said.
  3. Good abstractions are conceptually understandable, so that users can actually figure them out. Don Norman talks about the connection between devices and users’ mental models in “Design of Everyday Things”.
  4. And good abstractions are reliable, in that they extend to all relevant circumstances. This reliability develops users’ trust and satisfaction – which is discussed in Norman’s “Emotional Design”. It also enhances the power of the abstraction.

Computer monitors are very powerful — they can display essentially infinite patterns via a simple abstraction of colored pixels. They successfully hide the underlying digitical circuitry. They are trivial to understand. And well-manufactured, they are reliable for years (and each pixel supports a relevant range of brightness and hue). I think these factors explain the computer monitor’s enormous success as a design.

Artificial Intelligence approaches often try to take great leaps of abstraction. Instead of typing letters, you say a word and expect the computer to understand which letters you meant. Snap a photo and expect the computer to recognize your friend. Such approaches will only be widely successful when they are sufficiently powerful and leak-free. Shouldn’t they work all the time, even in low-light situations? Shouldn’t they recognize all your friends, not just some? How about people you’ve never met? The recognition abstraction implies that these are all relevant circumstances, so it is hard to develop the user’s trust without supporting them all. User understanding about which of these circumstances are too technically challenging requires breaking the abstraction (leakiness). All of this suggests that these approaches will not be successful unless the limitations are clearly and easily understood (as in limited domains like speaking digits) or the technology eventually manages to fulfill all the expectations of the abstraction.

Outside of limited domains, I think the “pure recognition” abstraction is doomed because of the real ambiguities that exist in the world. There is often absolutely no way to tell the difference between certain words that sound the same or faces that look the same. The only way to know for sure is to ask questions, look at context, keep gathering data until the ambiguity is resolved. Since the user is involved in providing context, gathering more data, and answering questions, this requires a different metaphor – a “conversational recognition” abstraction. The CEO of ConceptQ was talking about this, and exploratory search interfaces like faceted browsing function along the same lines.

Everything is Miscellaneous

I recently finished reading Everything is Miscellaneous by David Weinberger. It wasn’t as insightful as I had hoped; but the main reason I read it is because it mentions Endeca (the software company I worked at this summer).

The main point was that categorizations, taxonomies, groupings, clusterings (whatever you want to call them) can increasingly be designed for any particular purpose. This is in contrast with the “one-size-fits-all” approach of the past (physical stores, dewey decimal system, database columns). I put the emphasis on design because designing an ordering system has the same essential properties as designing anything else. What’s really cool is when the ordering system can be designed automatically, on the fly, in response to user input — e.g. faceted browsing.

Next: On Intelligence by Jeff Hawkins.

Democracies and Mutual Funds

I was chatting with Rajiv today about politics and history and economics and came up with an interesting analogy.

We were talking about how monarchs can really do amazing things for a country if they’re good, but if they’re not good or just crazy they can really screw things up. In a democracy, by contrast, the people have the power but don’t necessarily know what they really want or how to get it done; things have to be voted on; nothing radical tends to happen.

So basically, democracy is like a mutual fund – low risk, medium return. Monarchy is more like an individual stock – more risky but with the potential for much higher returns. Democracies stick around while monarchies eventually get wiped out by a string of too many bad leaders.

Idea for a new thermostat design

A striking fact: most of my recent roommates — all smart enough to get into MIT — completely failed to understand our thermostat. At first, I attributed this to the notion that MIT students simply lack many everyday, non-academic skills. This may have some truth to it, but the wider conclusion is that the standard thermostat design is just not intuitive.

Here is how a thermostat works: The user sets an (unmarked) temperature with a dial or slider. From this, the thermostat extrapolates a low-temperature cutoff and a high-temperature cutoff. The low cutoff is lower than the user setting, and the high cutoff is higher than the user setting. The thermostat turns on the air conditioning if the temperature rises above the high cutoff. The air conditioning then REMAINS ON until the temperature is pushed all the way past the low cutoff. The temperature then rises naturally until it hits the high cutoff again, and again the air conditioner kicks in. (Switch “high” with “low” in the case of heating.) This makes sense technologically, because air conditioners and heaters are more efficient if they stay on for a while.

The way you are supposed to refine your temperature setting with a thermostat is as follows: If you are too cold, you move the setting until the air conditioning clicks off. This is your way of saying “I don’t want it to get colder than this.” Alternatively, if you are too hot, you move the setting in the cold direction until the air conditioning kicks in. “I don’t want it hotter than this.” If your range of acceptable temperatures is less than the thermostat’s, then you will be changing your setting on every cycle – but you won’t be changing it very much.

Judging from the vast range of settings I have found my roommates leaving the thermostat in, here is how they seem to WANT to interact with it: “Right now I’m really really hot, so I’m going to turn the temperature way down. Five minutes later, I’m still pretty hot, so I’m going to turn the temperature down some more.” At this point, the temperature setting is way colder than the comfortable level, so eventually it will become very cold in the apartment and the roommate will be really really cold and thus turn the temperature way up. Cycle repeats.

So here is my new thermostat design. There is a round dial with no temperature markers — the only markings indicate the “colder” direction vs. the “hotter” direction. If the user is very very hot, they turn the dial strongly in the “cold” direction. This tells the thermostat that the user temperature setting should be set substantially below the current air temperature. Five minutes later, it has cooled down a bit, but the user is still hot. They go back to the thermostat and turn it in the “cold” direction, but not as far as before because they are less hot. The thermostat correspondingly sets a user temperature that is moderately colder than the current air temperature.

In other words, the “user temperature” is determined not by the absolute position of the dial but by the amount of turning in a given adjustment. This design allows people to indicate their level of discomfort, as they seem to want to do intuitively, and avoids the “escalation” problems that occur with the traditional thermostat design.

Thoughts on Pen Interfaces

I recently did a small usability study which demonstrated that a pen stroke recognition interface was not a good choice for my graph sketching application. The failure of this interface helps to explain why pen interfaces have not yet become widely used. Even for an application domain that blatantly lends itself to a sketch-like interface, and even with a fairly accurate stroke recognizer, the recognition approach was a clear loser. For one thing, users seemed disconcerted by the unpredictable nature of the stroke recognition; they were downright annoyed when the system failed to read their mind. Users also seemed stressed about having to perform more accurately in order for the system to correctly recognize their intentions.

Although improvements in software and hardware interfaces could lessen both of these problems, I think the deeper issue here is that of appropriate *constraints*. Pen interfaces tend to be highly unconstrained, which gives them flexibility and power but also makes them overwhelming, stressful, ambiguous, and often inefficient. The most obvious example is with text input: typing is faster, more satisfying, and more accurate than tablet PC handwriting precisely because typing is so much more constrained. Each button does precisely one thing: insert a particular character into the event stream. Even if I had a futuristic handwriting recognizer that recognized with human accuracy and felt as good as paper, I would still rather use a keyboard for the task of inputting characters.

A similar argument can be made for the graph sketching domain. The reason I think the arc interface turned out to be most efficient (and enjoyable) was that it provided the correct degree of constraint for the task at hand. Curves, even complex ones, are really just a series of segment endpoints and curve points (which specify the amount and direction of bulge). The arc interface in effect let users precisely and easily specify these three points to create each arc segment. If they knew what they wanted the first time around, there was no need to go back and adjust anything, and there were no surprises from the recognizer. Creating complex curves only required lifting the mouse button momentarily to indicate an upcoming change in curvature, so that the computer could display the precise desired line.

The problem with true sketching is that there is ambiguity in every pen stroke, and even the most advanced stroke recognizer will not be able to read minds. The only clear way to resolve this ambiguity is to increase the number of constraints by letting the user point out exactly what they want. One approach to this specification is to display “n-best” lists of the potential options the recognizer thinks you might mean. But given the ease of simply specifying one’s intentions the first time around, and the fact that every segment is potentially ambiguous if left to a recognizer, I think there is a strong case to be made that the arc interface will be the best approach for this line-graphing task no matter the improvements in software and hardware.

More generally, I think that sketching is too under-constrained for many of the tasks that researchers have applied it to. For example, in any domain involving a small, fixed number of symbols, such as electronics or chemistry diagrams, the constrained approach would be to specify start and end points and press a button corresponding to the desired symbol. By contrast, sketching the diagram freehand takes substantial time and always has potential for recognition ambiguity. Freehand sketching may be more intuitive because that is what users are used to doing, but a more constrained interface may prove more efficient in a similar way that typing proved more efficient than handwriting. Of course, usability studies would be required to test these hypotheses.

Conversely, the tasks that freehand sketching are good for are less constrained applications like art or solving certain math problems. These are applications with an open-ended set of symbols and diagrams that require the flexibility of stroke input. Another class of applications which merit stroke input are limited-capability mobile devices that do not have space for a lot of buttons (either “soft” or physical). But in my opinion, such devices are only a temporary solution; ultimately, we should not limit our input devices but instead figure out how to make full capabilities possible in mobile settings.