Thoughts on Pen Interfaces – Robin's Interesting Thoughts

I recently did a small usability study which demonstrated that a pen stroke recognition interface was not a good choice for my graph sketching application. The failure of this interface helps to explain why pen interfaces have not yet become widely used. Even for an application domain that blatantly lends itself to a sketch-like interface, and even with a fairly accurate stroke recognizer, the recognition approach was a clear loser. For one thing, users seemed disconcerted by the unpredictable nature of the stroke recognition; they were downright annoyed when the system failed to read their mind. Users also seemed stressed about having to perform more accurately in order for the system to correctly recognize their intentions.

Although improvements in software and hardware interfaces could lessen both of these problems, I think the deeper issue here is that of appropriate *constraints*. Pen interfaces tend to be highly unconstrained, which gives them flexibility and power but also makes them overwhelming, stressful, ambiguous, and often inefficient. The most obvious example is with text input: typing is faster, more satisfying, and more accurate than tablet PC handwriting precisely because typing is so much more constrained. Each button does precisely one thing: insert a particular character into the event stream. Even if I had a futuristic handwriting recognizer that recognized with human accuracy and felt as good as paper, I would still rather use a keyboard for the task of inputting characters.

A similar argument can be made for the graph sketching domain. The reason I think the arc interface turned out to be most efficient (and enjoyable) was that it provided the correct degree of constraint for the task at hand. Curves, even complex ones, are really just a series of segment endpoints and curve points (which specify the amount and direction of bulge). The arc interface in effect let users precisely and easily specify these three points to create each arc segment. If they knew what they wanted the first time around, there was no need to go back and adjust anything, and there were no surprises from the recognizer. Creating complex curves only required lifting the mouse button momentarily to indicate an upcoming change in curvature, so that the computer could display the precise desired line.

The problem with true sketching is that there is ambiguity in every pen stroke, and even the most advanced stroke recognizer will not be able to read minds. The only clear way to resolve this ambiguity is to increase the number of constraints by letting the user point out exactly what they want. One approach to this specification is to display “n-best” lists of the potential options the recognizer thinks you might mean. But given the ease of simply specifying one’s intentions the first time around, and the fact that every segment is potentially ambiguous if left to a recognizer, I think there is a strong case to be made that the arc interface will be the best approach for this line-graphing task no matter the improvements in software and hardware.

More generally, I think that sketching is too under-constrained for many of the tasks that researchers have applied it to. For example, in any domain involving a small, fixed number of symbols, such as electronics or chemistry diagrams, the constrained approach would be to specify start and end points and press a button corresponding to the desired symbol. By contrast, sketching the diagram freehand takes substantial time and always has potential for recognition ambiguity. Freehand sketching may be more intuitive because that is what users are used to doing, but a more constrained interface may prove more efficient in a similar way that typing proved more efficient than handwriting. Of course, usability studies would be required to test these hypotheses.

Conversely, the tasks that freehand sketching are good for are less constrained applications like art or solving certain math problems. These are applications with an open-ended set of symbols and diagrams that require the flexibility of stroke input. Another class of applications which merit stroke input are limited-capability mobile devices that do not have space for a lot of buttons (either “soft” or physical). But in my opinion, such devices are only a temporary solution; ultimately, we should not limit our input devices but instead figure out how to make full capabilities possible in mobile settings.

Leave a Reply Cancel reply