Automated text categorization techniques applied to writing style can identify an author’s sex with 80% accuracy for a text set with 50% female-authored texts. If writing style provided no information about the author’s sex, automated text categorization would have an accuracy of only 50%, that of random guessing. In fact, writing style provides considerable information about an author’s sex. Women and men in general have highly distinguishable writing styles.[1]
Sex differences in writing seemed to be related to other types of sex differences. Women’s writing style is more similar to the style of fiction, while men’s is more similar to the style of non-fiction.[2] Women read more fiction than men do. A word-frequency factor analysis of a large corpus of blogs (150 million words) found that the content factors (Religion, Politics, Business, and Internet) characterized male bloggers’ blogs, and (Conversation, AtHome, Fun, Romance, and Swearing) characterized female bloggers’ blogs.[3] These content differences are consistent with more general characterizations of differences in writing style: women’s writing style tends to be more relationally involved and personalized, while men’s writing style tends to be more informational.[4] These differences are consistent with biological and comparative evidence indicating that social communication has been more important for females’ reproductive success (Darwinian evolution) than social communication has been for males’.
Taking sex seriously is essential for the best possible understanding of communications industry developments and for the greatest success in providing communications services.
* * * * *
Notes:
[1] Koppel, Argamon, and Shimoni (2002) classify formal written texts (fiction and non-fiction, including a variety of sub-genres) with 80% accuracy. Argamon et. al. (2007) achieved the same accuracy for classifying blog texts. The superb Hacker Factor provides a free, online Gender Guesser that classifies user-provided text. Automated sex categorization of query logs from web search with 56% male users has achieved 84% accuracy in identifying users’ sex. See Jones et. al. (2007).
[2] Argamon et. al. (2003).
[3] Argamon et. al. (2007). Studying search query frequencies, Weber and Castillo (2010) observed, “men appear to be more worried about deleting their search history while women tend to be more worried about removing their Facebook profiles.” That’s consistent with men’s interest in a relationally narrow activity (searching for porn to enjoy) and women being concerned about general social status.
[4] Argamon et. al. (2003). Argamon et. al. (2007) describes the difference as inner-directed vs. outer-directed and documents a similar difference across age. Id. argues that a common underlying factor explains both differences. That seems to me unlikely from an evolutionary perspective. The age factor may instead reflect increasing social isolation with age in contemporary U.S. society.
References:
Argamon, Shlomo, Moshe Koppel, Jonathan Fine, and Anat Rachel Shimoni. 2003. Gender, genre, and writing style in formal written texts. Text Interdisciplinary Journal for the Study of Discourse 23, no. 3: 321-346. http://www.reference-global.com/doi/abs/10.1515/text.2003.014.
Argamon, Shlomo, Koppel, Moshe, Pennebaker, James W., and Schler, Jonathan. “Mining the Blogosphere: Age, gender and the varieties of self-expression” First Monday [Online], Volume 12 Number 9 (3 September 2007).
Jones,Rosie, Ravi Kumar, Bo Pang, and Andrew Tomkins. 2007. “I know what you did last summer”: query logs and user privacy. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management (CIKM ’07). ACM, New York, NY, USA, 909-914. DOI=10.1145/1321440.1321573 http://doi.acm.org/10.1145/1321440.1321573
Koppel, Moshe, Shlomo Argamon, and Anat Rachel Shimoni. 2002. Automatically Categorizing Written Texts by Author Gender. Literary and Linguistic Computing 17, no. 4: 401-412. http://llc.oupjournals.org/cgi/doi/10.1093/llc/17.4.401.
Weber, Ingmar, and Carlos Castillo. 2010. The demographics of web search. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR ’10). ACM, New York, NY, USA, 523-530. DOI=10.1145/1835449.1835537 http://doi.acm.org/10.1145/1835449.1835537
Wildly sexist claims with no decent controls. Apparently, one should conclude that women interested in politics, religion, business, and the internet rather than relationships and swearing can’t get dates and will be weeded out of the gene pool. This study has no significant control for GENRE, or apparently for education level; educated people vary their style (each person varies his or her style) depending on context. Genres are so complex, overlapping, and varied that the same person may appear rather hermaphroditic (my formal and informal nonfiction writing comes out classified as “male”; my personal and fictional writing comes out classified as “female.” Whodathunkit?) Another attempt by psychology to classify human behavior in its simplistic terms rather than in the complexity of humanities discourse.