no matter the man or the machine: March 2010

The Inmates Are Running the Asylum (Chapters 8-14)

I think it's safe to say that Alan Cooper's tone softens somewhat in the second half of this book. While the first seven chapters concerned themselves primarily with calling out the hubris and generally dickish behavior of programmers, the latter seven offered constructive suggestions and real-world examples of how to fix the problems present in the software industry. It was refreshing to see solutions offered to balance out all the acidic complaining. His concept of developing for "personas" rather than the generic "user" has a lot of merit - I even found myself using this process with my project team when developing our sketch application.

As I stated in my summary of the first seven chapters, I think a lot of the things Cooper has to say in this book lack the impact in 2010 that they had in 1999 largely because so much of it is part of the industry now. Companies do focus on interaction design. Programmers aren't the huge jocks they used to be (at least not to as large a degree). The software we have now is better than what we used to make. Is some of it still dancing bearware? Probably. I don't think that kind of software will ever completely disappear. However, I think it's safe to say at this point that (forgive me) the inmates are no longer running the asylum.

Annotating Gigapixel Images

Related comments:

Summary:
Qing Luan, et al. present their developments in the annotation of very large (billions of pixels or gigapixel) images in this paper. Their aim was to augment the pan-and-zoom interface used by applications like Google Earth and Virtual Earth with visual and audio annotations, driven by an intelligent rendering system that takes into account the viewer's perceptual distance from the objects being annotated. They mentioned related work in areas like zoomable UIs, map labeling, human psychophysics, and augmented reality systems. The system developed by the researchers runs on HD View and consists of text labels, audio loops, or narrative audio.

The system gauges viewer perspective, depth, and field of view relative to each annotation; it then assigns strengths to the various annotations based on these elements. Thus, farther-off text labels will be smaller or farther-off audio annotations will be played at lower volume.

Discussion:
I haven't played with Google Earth very much, but according to the authors this system is a lot like it, so I think I have a pretty good idea of how the interface works. I think it's a great idea to render annotations dynamically based on viewer position -- if all available annotations were rendered simultaneously on a map of the United States it would be completely unreadable. Hiding or showing labels as the user pans and zooms encourages the user to explore the image to see what annotations can be uncovered. This would be great if applied to an educational setting where children could browse a map of, say, Philadelphia and read excerpts from historical documents or hear audio clips about important locations.

Edge-Respecting Brushes

Related comments:

Summary:
In this paper, Dan R. Olsen Jr. and Mitchell K. Harris of Brigham Young University's Computer Science Department discuss various methods of making brushes for image-editing programs like Photoshop "smarter" by utilizing least-cost algorithms and edge-respecting implementations.

The authors discuss five prior techniques that framed their research -- flood fill (fills all adjacent pixels the same color until a clearly-defined edge is reached), boundary specification (the "lasso" tool), tri-maps (user-specified map of foreground, background, and unused pixels), bilateral grids (color difference-aware), and quick select (inferred selection from a brush stroke). The edge-respecting brush they developed takes elements of these techniques and refines them by applying cost and alpha computation algorithms like the one below:

A user study was conducted with 9 subjects who were asked to manipulate 6 images each. The edge-respecting brush was found to produce edge-agreement accuracy 99% of the time -- 3% higher on average than the traditional lasso and snap tools.

Discussion:

I wish Photoshop already had something like this. Lasso selection is a painful process that requires Keebler-elf precision and saintly patience to yield accurate results, and most snap tools don't perform well enough for my needs. Being able to paint along an edge without fear of messing up a critical part of the image would make creating layers much easier, and generally speed up the image editing process.

EDIT: I found out a few days ago that Photoshop CS5 is actually implementing something like this with their "content-aware fill" tool. Awesome!

Foldable Interactive Displays

Related comments:

Summary:
Authors Johnny Chung Lee, Scott E. Hudson, and Edward Tse (from Carnegie Mellon's HCI Institute and Smart Technologies respectively) discuss their research on foldable displays in this paper. Most "flexible display" prototypes that are typically reported about are based on OLED (organic light-emitting diode) technology, which are still limited in their implementation and offer no form of touch input. The foldable displays discussed by the authors are actually flexible screens with embedded infrared (IR) LEDs that allow a camera/projector combination to track the display and project an appropriately-sized image onto it. The IR LEDs basically function as fiducial markers in an augmented-reality system where digital images (in this case, whatever is supposed to show up on the display) are superimposed over real life. The authors discussed the advantages and drawbacks of four main foldable shapes -- the newspaper, scroll, fan, and umbrella -- as well as how IR tracking is accomplished on each shape.

Discussion:
When I read the title of the paper, I assumed that the displays would be driven by OLED and that the authors had somehow solved the problem of touch input on such a display. Alas, it was not to be as they basically rigged an augmented-reality system together with IR LEDs and a Wii-mote which allowed camera-based tracking and projection onto the static, un-interactive surface. The displays they discussed could have been made of anything -- paper, cloth, plastic, whatever. The whole system is driven by the camera tracking. I suppose I shouldn't have gotten my hopes up so high, but it's difficult to see how this system could be used anywhere outside of a classroom or research facility with the necessary technologies already installed. The point of smaller, mobile displays is to be able to carry them with you without the need for external equipment. The only real advantage I can see here (other than the orientation-sensitivity, which was kind of cool) is that a user would be able to carry a movie-screen-sized display in a backpack or even their pocket and set it up with minimal effort, assuming that the IR-tracking and camera equipment is available wherever they planned on watching a movie.

Inky: A Sloppy Command Line for the Web with Rich Visual Feedback

Related comments:

Summary:
In this paper, Robert C. Miller, et al. (researchers at MIT CSAIL and the University of Southampton) present Inky (short for Internet Keywords), which is basically a command line interface that provides shortcuts to common web browser tasks. It functions as a hybrid between a command line and GUI interface, giving dynamic visual feedback to the user as they type and utilizing sloppy syntax which frees the user from learning any new command syntax. Keywords can be provided in any order, replaced with synonyms, or entered in a variety of ways; the Inky interpreter attempts to match the user's input against available commands. Inky is a Firefox extension built using HTML, CSS, JavaScript, Java, and XML. The various construction, command, and content-highlighting specifics are discussed in the rest of the paper, covering details about the keyword interpreter, functions and usage.

The authors conducted a small user study of seven participants and found that 95 out of 131 user commands were correctly interpreted, and that Inky was fairly easily learnable based on user response. Most of the commands that were not correctly interpreted were simply attempts by users to invoke commands that did not exist (by entering website names, for example). The researchers used their findings about Inky to begin development on a second prototype called Pinky (Personal Information Keywords) which focuses on lightweight data capture for personal information management.

Discussion:
My first thought about this application was that it seemed an awful lot like Quicksilver for Mac -- an application launcher/wizard's tool that interprets user input to execute commands and launch applications. This paper even mentions Quicksilver, in fact, when talking about how Inky is invoked. It also shares many similarities with Mozilla's own Ubiquity. Like, a lot of similarities. As in, these tools are nearly identical in function and aim. Having used Quicksilver and Ubiquity pretty extensively, I think a command-line-style interface for quick command execution is a great idea that can really streamline a user's experience (if they are willing to take on a small learning curve).

Understanding the Intent Behind Mobile Information Needs

Related comments:

Summary:
In this paper, Karen Church and Barry Smyth (of Telefonica Research and University College Dublin respectively) lay out their study of user's mobile content needs and mobile contexts. The explosion of mobile device popularity across the world (3.5 billion subscribers in 2007) has led to new patterns of information retrieval and new needs for users. The authors analyze previous work dealing with what mobile users search for and why; the research indicated that 50% of the top mobile search queries were related to adult content (though only 8% of all users in the study engaged in any kind of search at all) and that intent for search could usually be attributed to a need for awareness or status-checking behavior.

The authors conducted a four-week study of 20 participants who were asked to keep a diary of all their information needs while they were at home, work, or on the move. Participants logged the date and time, location, and information need, as well as any additional comments they had. The study generated 405 diary entries (approx. 20.3 per person), the majority of which concerned on-the-go conditions for mobile information need. The researchers created three categories for user intent: informational (how-to's, advice, showtimes), geographical (location-based or directions), and PIM or personal information management (PIN codes, friend requests, to-do lists). 30% of entries were geographical in nature, and 42% of entries were non-informational. PIM entries represented 11% of the entries.

Overall, the study indicated that mobile users look for considerably different information than standard Web users, seeming to have a greater need for geographical-based information such as directions or service locations while they were on-the-move; user information needs also seemed to revolve heavily around social interaction (friend requests, questions asked in conversation, status updates etc).

Discussion:
The research presented in this paper is very relevant to current trends in technology - users are gravitating more and more to mobile content, and the advent of user-friendly mobile browsing devices like the iPhone and the upcoming iPad will only heighten this trend. I found the categorizations made by the researchers to be pretty accurate - in the brief period that I owned a web-enabled phone, I used it primarily for getting directions, checking Facebook or Twitter, and managing email. This falls right in line with the results of the study. However, in the future I think user content needs will shift more and more toward entertainment in a variety of contexts and locations as more content is pushed to the mobile space and bandwidth increases.

Simplified Facial Animation Control Utilizing Novel Input Devices: A Comparative Study

Related comments:

Summary:
This paper by Nikolaus Bee, Bernhard Falk and Elisabeth Andr´e from the University of Augsburg's Institute of Computer Science details new methods and controls for manipulating facial animations. First, the researchers discuss the advantages and drawbacks of predominant current methods for facial animation control. Slider-based GUI systems, for example, are easy to implement and familiar to users, but they afford control over only one parameter at a time and do not typically have an obvious mapping for manipulation. The authors discuss several previous studies about direct-mapping including data gloves, data suits, and midi keyboards; they also detail typical facial expression generation technologies like the Facial Action Coding Systems (FACS) which has been used in everything from the Lord of the Rings Trilogy to Half-Life 2. The researchers implemented FACS to drive the facial model - "Alfred" - that they used for their study.

Basically, control points on Alfred's facial structure were mapped to various buttons on an XBox 360 controller or data glove with three different settings - upper face, lower face without inner lips, and inner lips.

To evaluate the ease of use of this system, the authors recruited a group of 17 subjects who were trained in using the slider and gamepad systems and then asked to create three facial expressions based on a photo. Overall, the participants found the gamepad more enjoyable, accurate, and satisfying to use than sliders. The gamepad also created a nearly 30% increase in speed for most cases.

Discussion:
The authors' work in this area is very promising. The obvious application for such a system would be for users to tweak the facial expression of their XBox Live avatar as they see fit. It would also be cool if players could create simple animations for sending messages to friends or other players on the network. Allowing gamers this level of customization to their characters might increase the affective level of interaction between players.

Emotional Design

It's been a very fascinating thing to witness the complete 180 that Donald Norman has made over the course of nearly two decades between the publishing of The Design of Everyday Things and 2004's Emotional Design. In the former, he made some very strong (and very cranky) arguments about the cardinal importance of functionality in the design of products, and frequently made fun of the sort of products and structures that "must have won a design award." Critics of Norman have stated that if we were all to follow the principles in TDoET we would have usable but ugly designs.

But after years of research in the area of human emotions, Norman embraces the very sort of products he so readily dismissed in his previous book -- like the juicer that dominates the front cover of Emotional Design. It is pretty to look at, certainly, and it does have some functional aspects (the lowest point of the "rocket" body serves as a drip point for juice) but according to its designer it is not actually intended to be used for making juice. Norman's point with this example (and the rest of his book) is that emotional and sensory appeal in design takes precedence over functionality.

This main point breaks down emotional appeal into three main categories: visceral, behavioral, and reflective. Designs that appeal on the visceral level are the ones that elicit a base, visually-stimulating response. Behavioral design focuses on a product's ease-of-use and the pleasure derived thenceforth. Reflective design considers the rationalization and intellectualization of a product -- does it tell a story or make its owner think more deeply about it? Norman's talk at TED Conference 2003 illustrates these three new principles pretty well (and will take you a lot less time than reading this book).

I enjoyed reading this book and finding out more about how emotions are so critical to the way that we work as humans. Norman was much less of a blowhard in this book and I had an easier time taking him seriously. Well, at least until I got to the final chapter of the future of robots -- that was a major left-turn that felt like the end of the movie A.I. Artificial Intelligence; things were going fine and then all of a sudden I was like, "wait, did that really just happen?" It felt very out of place (though I suppose it's kind of a segway into his next book The Design of Future Things) and was the most weakly-argued of his chapters. Overall though, I enjoyed it much more than TDoET but the last chapter has me a little scared at what The Design of Future Things has in store.

no matter the man or the machine