The PicturePhone was a spectacular failure in the U.S. in the early 1970s. Many factors contributed to the PicturePhone’s flop. It required significant up-front equipment expenditure coordinated across users. It was expensive to use. It was bulky. It highly constrained the bodily position of users: compared to the PicturePhone, the fixed line phone of that time was a “mobile” phone. Because of these and other weaknesses, the PicturePhone became the communications industry’s Edsel.

The massive, money-losing investment in PicturePhone shouldn’t be understood to indicate that voice is all that most persons want in most personal communication. The PicturePhone had the technical capability to combine voice and images. That is not sufficient to create economic value in communication. Economic value in communication depends on broader sensory circumstances and more specific behavioral goals of users.

Good sensory design of communication services requires understanding behavioral goals. Consider, for example, voice quality. High voice quality might mean transmitting the full audible range of a person’s voice, and nothing else (no “noise”). Research indicates, however, that persons are able to identify locations based on their acoustic qualities. If the goal of a voice conversation is to transmit specific information in speech, then ambient sound is “noise”. But if the goal of a voice conversation is to make sense of the other’s circumstances, then ambient sound might enhance communication, particularly for a mobile device.

Identifying specific persons, while often taken for granted, is an important goal in communication. Factors relevant to identifying persons by sight are not just pixel resolution and color depth. For example, the orientation of a face affects the amount of time to detect whether the face is smiling or frowning (please do future frowning upside-down). Moreover, the sound of a person’s voice creates a sense of what the person looks like speaking. The value of a communication service depends on the sensory affordances it provides in relation to the multimodal human perceptual routines for identifying persons.

Another goal in communication, one that is probably overvalued in theory, is understanding what a specific person is saying. Seeing lips annunciating sounds affects what sounds are heard. Moreover, the orientation of a face affects the integration of the sight of lip movements and the sounds that are heard (check out this amazing demonstration). Recognizing a face, seeing lip movements, and hearing sounds are all sensory dimensions that contribute to understanding, or misunderstanding, what a specific person is saying.

Google has integrated visual identity in Google Talk and Gmail. Visual identity doesn’t generate any additional constraints on the use of the service. The cost to users is image acquisition and set-up costs. All in all, it’s a minor innovation. But, unlike the PicturePhone, it enlarges sensory circumstances to serve a specific behavioral goal in communication. That’s a major way to create value.

