Bell Labs, one of the world’s premiere research institutions, was promoting the development of the PicturePhone by 1969. The first article documenting that the sight of lips annunciating sounds affects hearing (the McGurk Effect) was published in Nature, a leading scientific journal, in 1976. The authors of the article, Harry McGurk and John McDonald, were affiliated with the Department of Psychology, University of Surrey, UK. Did researchers at Bell Labs know about the McGurk Effect by 1969?
At least this is clear: good science, whether known or yet to be discovered, is not sufficient to produce a profitable new product.
The PicturePhone was a spectacular failure in the U.S. in the early 1970s. Many factors contributed to the PicturePhone’s flop. It required significant up-front equipment expenditure coordinated across users. It was expensive to use. It was bulky. It highly constrained the bodily position of users: compared to the PicturePhone, the fixed line phone of that time was a “mobile” phone. Because of these and other weaknesses, the PicturePhone became the communications industry’s Edsel.
The massive, money-losing investment in PicturePhone shouldn’t be understood to indicate that voice is all that most persons want in most personal communication. The PicturePhone had the technical capability to combine voice and images. That is not sufficient to create economic value in communication. Economic value in communication depends on broader sensory circumstances and more specific behavioral goals of users.
Good sensory design of communication services requires understanding behavioral goals. Consider, for example, voice quality. High voice quality might mean transmitting the full audible range of a person’s voice, and nothing else (no “noise”). Research indicates, however, that persons are able to identify locations based on their acoustic qualities. If the goal of a voice conversation is to transmit specific information in speech, then ambient sound is “noise”. But if the goal of a voice conversation is to make sense of the other’s circumstances, then ambient sound might enhance communication, particularly for a mobile device.
Identifying specific persons, while often taken for granted, is an important goal in communication. Factors relevant to identifying persons by sight are not just pixel resolution and color depth. For example, the orientation of a face affects the amount of time to detect whether the face is smiling or frowning (please do future frowning upside-down). Moreover, the sound of a person’s voice creates a sense of what the person looks like speaking. The value of a communication service depends on the sensory affordances it provides in relation to the multimodal human perceptual routines for identifying persons.
Another goal in communication, one that is probably overvalued in theory, is understanding what a specific person is saying. Seeing lips annunciating sounds affects what sounds are heard. Moreover, the orientation of a face affects the integration of the sight of lip movements and the sounds that are heard (check out this amazing demonstration). Recognizing a face, seeing lip movements, and hearing sounds are all sensory dimensions that contribute to understanding, or misunderstanding, what a specific person is saying.
Google has integrated visual identity in Google Talk and Gmail. Visual identity doesn’t generate any additional constraints on the use of the service. The cost to users is image acquisition and set-up costs. All in all, it’s a minor innovation. But, unlike the PicturePhone, it enlarges sensory circumstances to serve a specific behavioral goal in communication. That’s a major way to create value.
It seems that The Structures of Letters and Symbols throughout Human History Are Selected to Match Those Found in Objects in Natural Scenes. This is a large-scale example of ecology shaping communications technology.
This sort of effect also occurs at a much smaller scale. Compare the geometric patterns in the paintings in the Morgan Picture Bible of Louis IX to those in the Marc Chagall Bible Series. The artists that produced the Morgan Bible primarily illuminated books. Marc Chagall primarily produced individual paintings. Not surprisingly, the Morgan Bible’s paintings look a lot like text, while Chagall’s don’t.
Recent research indicates that human visual processing capabilities have shaped text. Letters in 96 non-logographic writing systems, Chinese characters, and natural scenes all have similar distributions of topological configurations. The human visual system evolved to process natural scenes. Writing systems from around the world and throughout the history of written language appear to be well-matched to the evolved visual capabilities of human beings.
This same research indicates that motor complexity of writing is less important than visual processing for reading in shaping the distribution of topological configurations. The frequency ranks of topological configurations in widely used writing systems are not significantly correlated with a measure of the motor effort required to produce the letter or character. Moreover, shorthand, which is designed to be written quickly, has a significantly different topological distribution of letters than does more widely used writing systems. Young children’s scribbles, which reflect relatively weak motor capabilities, also have a significantly different topological distribution than widely used writing systems. Within the space of relatively simple topological possibilities, motor complexity seems not to have strongly affected the design of writing systems.
This research suggests that the design of text favors reading over writing. That’s a plausible design orientation. The invention of text was probably oriented toward the market for memorializing events and storing knowledge. Those are communicative functions that involve writing that is read many times. Text messaging among family and friends is not that type of communication.
Text has other disadvantages as technology for personal communication. Compared to audible language, text has a relatively high bodily cost. Babies easily learn audible language. In contrast, the capability to read and write requires from humans a large, specialized investment in time and attention (schooling). Moreover, broad patterns of media use indicate that persons prefer spending time with audiovisual media than with text. This suggests that the marginal bodily processing cost of reading is higher than for audiovisual communication.
The design of text and the design of the human body disadvantage text messaging for presence-oriented, personal communication. Experts in the field assure me that their teen-aged daughters find great value in text messaging among their friends, value that voice communication does not provide. I respect this expertise. The research discussed above, however, at least indicates the importance of considering carefully how text messaging creates value relative to voice communication.
Addendum: The research on topological configurations in written languages is brilliant, pioneering research. Extensive discussion of the analysis and duplication of the findings would help to ensure that they are correct. I noticed that the analysis did not weigh topological configurations by frequency of use in representative text. Perhaps this wasn’t done because generating such weights might require considerable additional effort. I would like to see future research at least consider the significance of use weights.
The full citation for the research on topological configurations is:
Mark A. Changizi, Qiong Zhang, Hao Ye, and Shinsuke Shimojo (2006), “The Structures of Letters and Symbols throughout Human History Are Selected to Match Those Found in Objects in Natural Scenes,” American Naturalist, v. 167, pp. E117-E139.