Real-time personal visual communication devices have not been commercially successful. 3G mobile video calling seems to have been a flop. Real-time visual communication isn’t a feature understood in marketing mobile communication devices. Is real-time personal visual communication a dead-end in the communications industry?
Recent research highlights the value of conveying movement in personal communication. Researchers Douglas Cunningham and Christian Wallraven compared faces represented as vertices, connected vertices, and surface-modeled, connected vertices. Reducing the number of vertices tended to degrade recognition of facial expressions in static representations of all three model types. But if expressive movement was incorporated in the models, reducing the number of vertices more than a hundred-fold (from 15,726 to 127) had little effect on expression recognition. Moreover, with movement represented, models with vertices-only performed about as well as surface-modeled, connected vertices. In addition, reducing the screen size of the representations from 512×384 pixels to 128×96 had little effect on expression recognition.[1] Simple, small, point-light displays that represent movement are efficient means for conveying facial expressions and emotions more generally.
Additional research shows that movement conveys information not present in static images. Actors performed realistic expressions of nine conversational emotions: agree, disagree, happy, sad, clueless (don’t know the answer), thinking, confused (don’t understand question), disgust, and pleasantly surprised. Cunningham and Wallraven tested persons’ ability to recognize (in a 10-alternative, non-forced-choice task) recordings of these expressions: a video sequence running from neutral expression to peak expression, and a static image at peak expression. The over-all accuracy of recognition was significantly higher for the video sequence than for the static image (78% versus 52% correct identification). [2]
A variety of additional experiments further identified the value of movement. Cunningham and Wallraven reduced the video sequence to the last 16 frames and expanded the static presentation to those sixteen frames temporally sequenced in an over-all image grid. Participants significantly more accurately recognized facial expressions in the dynamic 16-frame presentation than in the static 16-frame grid (roughly 75% to 60%). Scrambling the order of the frames in video presentations roughly eliminated the advantage of video. Playing the video backwards significantly lessened the accuracy of expression recognition. At least 100 milliseconds of temporarly integrated, forward-sequenced images seems necessary to capture the value of movement information for recognizing facial expressions.[3]
Video news channels illustrate the communicative power of facial movement. Try watching a major, anchor-based news broadcast with the sound turned off. The facial expressions of news anchors are quite extraordinary. They intently focus their eyes straight out at the viewer, exaggerate head movements and facial gestures, sharply punctuate their facial gestures with pauses and rapid dynamics, and expressively communicate concern and urgency as they report a father discovering a man hiding in the bushes and looking into his daughter’s bedroom. The news anchors’ facial expressions powerfully attract viewers’ attention and shape viewers’ emotional responses.
The rapid take-up of iPhones demonstrates that good device design can transform a broad product space. Smart phones, e-readers, and various other electronic devices are proliferating. At least one informed industry participant sees a bright future for see-what-I-see communications devices. Developing a real-time visual communications device will require considerable innovation in device design. But with a good biological and ecological design, such a device could have a major impact on the communications industry.
Notes:
[1] See Cunningham, Douglas W. and Christian Wallraven. The interaction between motion and form in expression recognition. Proceedings of the 6th Symposium on Applied Perception in Graphics and Visualization (APGV 2009), 41-44. (Eds.) Mania, K., B. E. Riecke, S. N. Spencer, B. Bodenheimer, C. O‘Sullivan, ACM Press, New York, NY, USA (2009). These findings don’t imply that video quality matters little. Rather, they point to the importance of a biologically and ecologically informed analysis of video quality. A point-light display provides a high-fidelity representation of a small number of points of motion.
[2] See Cunningham, D. W. & Wallraven, C. (2009). Dynamic information for the recognition of conversational expressions. Journal of Vision, 9(13):7, 1-17, http://journalofvision.org/9/13/7/, doi:10.1167/9.13.7. The advantage of video depends significantly on the specific conversational expression. Video is hugely advantageous for communicating agree and disagree. That’s probably because slight head movements vertically and horizontally, respectively, tend to characterize these expressions. Happiness, in contrast, was the only expression correctly identified significantly more accurately with a static presentation than with a video presentation. Compared to a static expression of happiness, a video expression of happiness may be more extensively interpreted, e.g. as actually indicating deception or manipulation. The experimental determination of correct interpretation doesn’t control for different levels of interpretation.
[3] Id.
The author’s focus on “movement” may be misplaced. Naturally, if one performs an analysis using multiple still images one will obtain a better result that than if one only analysis a single image. All the evidence that the author provides to prove his point about movement, can be explained by merely understanding that the analysis of plural still images will provide a better result than the analysis of a single still image. Focusing on movement implies “rate of change” or in mathematical terms a “derivative”. What the author discusses can be explained by merely focusing on multiple still images. The author gives no evidence that the use of “motion”, that is rate of change or the mathematical derivative provides any benefit. Possible it does, but to understand any such benefit one must examine “motion” as compared to a plurality of still images.
Anon, consider this:
For more details, read Cunningham and Wallraven. They analyzed in detail the issue that you raise.