Design as Abstraction

I am going to posit that good design means coming up with good abstractions.

What do I mean by abstractions?
Anything that takes you up a level of abstraction so you don’t have to worry about the underlying details anymore. For example, WYSIWYG interfaces abstract away the fact that the computer is really manipulating some sort of data store. Data stores themselves abstract away the details of memory mangagement and disk access. Physical inventions like glue and paint abstract away from the chemical underpinnings; electrical tools abstract from their electrical basis. Rice cookers abstract away from the details of stove temperature and cooking time.

What makes a good abstraction?

  1. Good abstractions are powerful, in the sense that they let you do a lot with the very same tool. This is what Will (the designer of many of the Sim games) said.
  2. And good abstractions are not “leaky” – they do not force the user to resort to, or even understand, the underlying details that the abstraction is supposed to cover up. This is what Joel (On Software) said.
  3. Good abstractions are conceptually understandable, so that users can actually figure them out. Don Norman talks about the connection between devices and users’ mental models in “Design of Everyday Things”.
  4. And good abstractions are reliable, in that they extend to all relevant circumstances. This reliability develops users’ trust and satisfaction – which is discussed in Norman’s “Emotional Design”. It also enhances the power of the abstraction.

Computer monitors are very powerful — they can display essentially infinite patterns via a simple abstraction of colored pixels. They successfully hide the underlying digitical circuitry. They are trivial to understand. And well-manufactured, they are reliable for years (and each pixel supports a relevant range of brightness and hue). I think these factors explain the computer monitor’s enormous success as a design.

Artificial Intelligence approaches often try to take great leaps of abstraction. Instead of typing letters, you say a word and expect the computer to understand which letters you meant. Snap a photo and expect the computer to recognize your friend. Such approaches will only be widely successful when they are sufficiently powerful and leak-free. Shouldn’t they work all the time, even in low-light situations? Shouldn’t they recognize all your friends, not just some? How about people you’ve never met? The recognition abstraction implies that these are all relevant circumstances, so it is hard to develop the user’s trust without supporting them all. User understanding about which of these circumstances are too technically challenging requires breaking the abstraction (leakiness). All of this suggests that these approaches will not be successful unless the limitations are clearly and easily understood (as in limited domains like speaking digits) or the technology eventually manages to fulfill all the expectations of the abstraction.

Outside of limited domains, I think the “pure recognition” abstraction is doomed because of the real ambiguities that exist in the world. There is often absolutely no way to tell the difference between certain words that sound the same or faces that look the same. The only way to know for sure is to ask questions, look at context, keep gathering data until the ambiguity is resolved. Since the user is involved in providing context, gathering more data, and answering questions, this requires a different metaphor – a “conversational recognition” abstraction. The CEO of ConceptQ was talking about this, and exploratory search interfaces like faceted browsing function along the same lines.

Everything is Miscellaneous

I recently finished reading Everything is Miscellaneous by David Weinberger. It wasn’t as insightful as I had hoped; but the main reason I read it is because it mentions Endeca (the software company I worked at this summer).

The main point was that categorizations, taxonomies, groupings, clusterings (whatever you want to call them) can increasingly be designed for any particular purpose. This is in contrast with the “one-size-fits-all” approach of the past (physical stores, dewey decimal system, database columns). I put the emphasis on design because designing an ordering system has the same essential properties as designing anything else. What’s really cool is when the ordering system can be designed automatically, on the fly, in response to user input — e.g. faceted browsing.

Next: On Intelligence by Jeff Hawkins.

Democracies and Mutual Funds

I was chatting with Rajiv today about politics and history and economics and came up with an interesting analogy.

We were talking about how monarchs can really do amazing things for a country if they’re good, but if they’re not good or just crazy they can really screw things up. In a democracy, by contrast, the people have the power but don’t necessarily know what they really want or how to get it done; things have to be voted on; nothing radical tends to happen.

So basically, democracy is like a mutual fund – low risk, medium return. Monarchy is more like an individual stock – more risky but with the potential for much higher returns. Democracies stick around while monarchies eventually get wiped out by a string of too many bad leaders.

Idea for a new thermostat design

A striking fact: most of my recent roommates — all smart enough to get into MIT — completely failed to understand our thermostat. At first, I attributed this to the notion that MIT students simply lack many everyday, non-academic skills. This may have some truth to it, but the wider conclusion is that the standard thermostat design is just not intuitive.

Here is how a thermostat works: The user sets an (unmarked) temperature with a dial or slider. From this, the thermostat extrapolates a low-temperature cutoff and a high-temperature cutoff. The low cutoff is lower than the user setting, and the high cutoff is higher than the user setting. The thermostat turns on the air conditioning if the temperature rises above the high cutoff. The air conditioning then REMAINS ON until the temperature is pushed all the way past the low cutoff. The temperature then rises naturally until it hits the high cutoff again, and again the air conditioner kicks in. (Switch “high” with “low” in the case of heating.) This makes sense technologically, because air conditioners and heaters are more efficient if they stay on for a while.

The way you are supposed to refine your temperature setting with a thermostat is as follows: If you are too cold, you move the setting until the air conditioning clicks off. This is your way of saying “I don’t want it to get colder than this.” Alternatively, if you are too hot, you move the setting in the cold direction until the air conditioning kicks in. “I don’t want it hotter than this.” If your range of acceptable temperatures is less than the thermostat’s, then you will be changing your setting on every cycle – but you won’t be changing it very much.

Judging from the vast range of settings I have found my roommates leaving the thermostat in, here is how they seem to WANT to interact with it: “Right now I’m really really hot, so I’m going to turn the temperature way down. Five minutes later, I’m still pretty hot, so I’m going to turn the temperature down some more.” At this point, the temperature setting is way colder than the comfortable level, so eventually it will become very cold in the apartment and the roommate will be really really cold and thus turn the temperature way up. Cycle repeats.

So here is my new thermostat design. There is a round dial with no temperature markers — the only markings indicate the “colder” direction vs. the “hotter” direction. If the user is very very hot, they turn the dial strongly in the “cold” direction. This tells the thermostat that the user temperature setting should be set substantially below the current air temperature. Five minutes later, it has cooled down a bit, but the user is still hot. They go back to the thermostat and turn it in the “cold” direction, but not as far as before because they are less hot. The thermostat correspondingly sets a user temperature that is moderately colder than the current air temperature.

In other words, the “user temperature” is determined not by the absolute position of the dial but by the amount of turning in a given adjustment. This design allows people to indicate their level of discomfort, as they seem to want to do intuitively, and avoids the “escalation” problems that occur with the traditional thermostat design.

Thoughts on Pen Interfaces

I recently did a small usability study which demonstrated that a pen stroke recognition interface was not a good choice for my graph sketching application. The failure of this interface helps to explain why pen interfaces have not yet become widely used. Even for an application domain that blatantly lends itself to a sketch-like interface, and even with a fairly accurate stroke recognizer, the recognition approach was a clear loser. For one thing, users seemed disconcerted by the unpredictable nature of the stroke recognition; they were downright annoyed when the system failed to read their mind. Users also seemed stressed about having to perform more accurately in order for the system to correctly recognize their intentions.

Although improvements in software and hardware interfaces could lessen both of these problems, I think the deeper issue here is that of appropriate *constraints*. Pen interfaces tend to be highly unconstrained, which gives them flexibility and power but also makes them overwhelming, stressful, ambiguous, and often inefficient. The most obvious example is with text input: typing is faster, more satisfying, and more accurate than tablet PC handwriting precisely because typing is so much more constrained. Each button does precisely one thing: insert a particular character into the event stream. Even if I had a futuristic handwriting recognizer that recognized with human accuracy and felt as good as paper, I would still rather use a keyboard for the task of inputting characters.

A similar argument can be made for the graph sketching domain. The reason I think the arc interface turned out to be most efficient (and enjoyable) was that it provided the correct degree of constraint for the task at hand. Curves, even complex ones, are really just a series of segment endpoints and curve points (which specify the amount and direction of bulge). The arc interface in effect let users precisely and easily specify these three points to create each arc segment. If they knew what they wanted the first time around, there was no need to go back and adjust anything, and there were no surprises from the recognizer. Creating complex curves only required lifting the mouse button momentarily to indicate an upcoming change in curvature, so that the computer could display the precise desired line.

The problem with true sketching is that there is ambiguity in every pen stroke, and even the most advanced stroke recognizer will not be able to read minds. The only clear way to resolve this ambiguity is to increase the number of constraints by letting the user point out exactly what they want. One approach to this specification is to display “n-best” lists of the potential options the recognizer thinks you might mean. But given the ease of simply specifying one’s intentions the first time around, and the fact that every segment is potentially ambiguous if left to a recognizer, I think there is a strong case to be made that the arc interface will be the best approach for this line-graphing task no matter the improvements in software and hardware.

More generally, I think that sketching is too under-constrained for many of the tasks that researchers have applied it to. For example, in any domain involving a small, fixed number of symbols, such as electronics or chemistry diagrams, the constrained approach would be to specify start and end points and press a button corresponding to the desired symbol. By contrast, sketching the diagram freehand takes substantial time and always has potential for recognition ambiguity. Freehand sketching may be more intuitive because that is what users are used to doing, but a more constrained interface may prove more efficient in a similar way that typing proved more efficient than handwriting. Of course, usability studies would be required to test these hypotheses.

Conversely, the tasks that freehand sketching are good for are less constrained applications like art or solving certain math problems. These are applications with an open-ended set of symbols and diagrams that require the flexibility of stroke input. Another class of applications which merit stroke input are limited-capability mobile devices that do not have space for a lot of buttons (either “soft” or physical). But in my opinion, such devices are only a temporary solution; ultimately, we should not limit our input devices but instead figure out how to make full capabilities possible in mobile settings.

Moving Blog

I think I will move my blog here because – you have to admit – it’s more reliable than WSO.

Global warming needs a political breakthrough

What has really struck me recently about global warming is how politics is the limiting factor in solving the problem. It seems that the political system is ill-equipped for dealing with something so global and so long-term. As a scientist, I read article after article about how we have the technologies and the economic strategies to solve global warming today, if only the politicians would cooperate (by funding the technologies and implementing the regulatory strategies). As a scientist, I don’t have the tools to even begin to understand what needs to be done to actually get the politicians to cooperate. In other words, I feel helpless when it comes to addressing what seems to be *the* important factor. We need a political science breakthrough rather than a technological breakthrough.

Artificial Intelligence is about Computers making Decisions

All of intelligence – anything observable that intelligent creatures DO – that is, anything by which you can *tell* something is intelligent – is a *decision*. This could be conscious or subconscious, but a decision nonetheless. You could have done something else or said something else. You said “no” to every other option and chose whatever it is you did. I’ve always been interested in this process of decision-making.

Computer science is my methodology – in order to solve problems, I write computer programs. This is how I naturally approach research, this is what I’m good at, this is what I do.

So I think of *intelligence* as *decision making*; and thus I want *computers* to make *decisions*. In fact, AI is all about computers making decisions. I think that is a *very* deep statement. That is what makes AI both powerful and scary. Talking to a computer does not seem so scary; computers making the world’s decisions does seem scary. But a talking computer is one that *makes decisions* about what to say. I can see why using a statistical/probabilistic framework would have been disconcerting for AI pioneers trying to get computers to make the *right* decisions.

Indeed, natural language generation is part of the very *essence* of AI! The very thing that forms the basis of the Turing test! So in some sense it really is “AI-complete”. That’s exciting, but it also means be careful – pragmatically, I need to find research that is tractable. Perhaps a computer that makes decisions for generating natural language about something specific – like Regina’s football database. Or perhaps a computer that makes decisions for generating a specific type of natural language about something general. What might I mean by a “specific type of natural language”? I’ll have to think about that. Or go ask the linguists.