December 2011 – Page 2

American University Library moving quickly to electronic resources

American University Library, a leading academic library, is rapidly increasing its patrons access to electronic resources. Electronic resources in fiscal year 2011 accounted for 74% of the American University Library’s collections expenditure. That’s up from 59% two years ago. Five years ago, the library’s journal collection was split 50-50 between print and electronic. Now the journal collection is 99% electronic. The journal collection offers access to about 85,000 e-journals. Access to such a vast collection of journals simply would not have been possible in print.

While the average yearly cost per e-journal is less than $36, some electronic subscriptions are quite expensive. American University Library’s access to Early English Books Online cost a $134,000 purchase fee plus $1,050 per year. Early English Books Online contains electronic versions of every book published in English between 1485 and 1700. That’s a wonderful resource and well worth the price paid.

Resources such as Early English Books Online could be much cheaper. None of the books in that collection are under copyright. If Google had scanned those books, Google would have been willing to make them freely available to everyone around the world. Moreover, the ability to search and download Google Books is much better than the capabilities offered through many library e-book subscriptions.

Compared to traditional scholarly publishers, Google’s offerings better serve libraries’ mission, but require more changes in libraries’ normal patterns of activity.

* * * * *

Statistics: American University Library’s electronic resources, fiscal years 2009-2011

Source note: All the statistics above are either from the American University Library Annual Report 2010-2011, or calculated from data in that report.

the history of paratextual organization

Reading practices affect how text is organized within books. Ancient Greek elite culture emphasized voiced reading of texts with acute concern for diction and style. Ancient Greek texts, and beginning in the second century GC, Latin texts, were written in capital letters without any spaces between words, without any punctuation, and without any division among sentences, paragraphs, or chapters (scriptio continua). That textual organization supported intensive teaching and rehearsal in declaiming a specific text.

Spacing between words and other textual articulations support rapid silent reading oriented towards conceptual reasoning. In Europe, protoscholastic Irish monks at the periphery of elite classical reading practices pioneered word spacing in Latin in the seventh and eight centuries. This textual practice gradually spread east and south across Europe in subsequent centuries. Word separation became normal in French manuscripts only in the eleventh century.[1] Flourishing scholasticism in twelfth and thirteenth century France generated highly articulated texts with section labeling, section numbering, and script and layout distinctions. Standard numbering of chapters in the (Latin) Bible began in Paris about 1230.[2] These textual features served readers who were silently reading, analyzing, and referencing specific pieces of text.

In the Islamic world, scholarly study of classical Greek texts prompted new textual organization in the tenth century. According to Ibn Abi Usaibia, Ja’far Ahmad ibn Muhammad ibn Abu al-Ash’ath led this development:

He delved deeply into Galen’s books and wrote commentaries on many of them. He divided each of the “Sixteen Books” in parts, chapters and paragraphs in such a manner as had never been done before. This has proved of great help to users of Galen’s books, for it facilitates locating what is wanted, furnishes references to any topic which is desired to be studied and gives information about the contents and purposes of any portion. He divided many of the works of Aristotle and others in the same way. [3]

This description explicitly links conceptual study (writing commentaries, referencing topics to be study) to articulating the text. The reference to direct access to any portion of the text almost surely implies a text-numbering scheme.

Textual study by Arabic scholars in the tenth century and European scholars in the thirteenth century produced similar textual articulations. The division of a text into words, sentences, paragraphs, and chapters is easy to take for granted today. Such textual organization, however, developed only in circumstances of conceptual textual study.

* * * * *

Notes:

[1] Saenger (1997) p. 23

[2] Blair (2010) pp. 38-9. Dominicans of the House of St. Jacques in Paris introduced chapter numberings in a bible concordance that they created from 1230-47. The printer Robert Estienne introduced Bible verse numbering in a New Testament that he printed in 1551.

[3] HP p. 473. Ibn Abu al-Ash’ath was active about 960 GC. Arabic texts always had word separation. Ancient Arabic, like ancient Hebrew and other ancient Semitic texts, was written without vowels. Hence word separation was necessary for unambiguously interpreting a text. Knowledge transmitted to Europe through Arabic texts contributed to the development of word separation in Latin:

Arabic scientific writings, when translated into Latin, brought word separation with them and formed the earliest body of writings to circulate invariably in word-separated text format. In these writings, untranslated Arabic phrases, written in Latin transliteration, were always separated, unlike analogous Greek passages in Latin texts, which had been written in unseparated script, except when copied by Irish scribes.

Saenger (1997) pp. 124-5. Apparently not recognizing the evidence from ibn Abu Usaibia, Blair (2010), p. 26, locates in thirteenth-century Egypt the Arabic development of “hierarchical and numbered divisions of the text, running heads, lettering of different sizes and colors, and tables of contents.”

References:

Blair, Ann. 2010. Too much to know: managing scholarly information before the modern age. New Haven: Yale University Press.

HP: Ibn Abi Usaybi’ah, Ahmad ibn al-Qasim. English translation of History of Physicians (4 v.) Translated by Lothar Kopf. 1971. Located in: Modern Manuscripts Collection, History of Medicine Division, National Library of Medicine, Bethesda, MD; MS C 294.

Saenger, Paul. 1997. Space between words: the origins of silent reading. Stanford, Calif: Stanford University Press.

child-support administration ignores economic reality

A key weakness of administratively determined prices is that they don’t respond rapidly and rationally to changes in economic circumstances. A friend who grew up in the Soviet Union cooked with an iron pot that had the price of the pot cast into its iron. Child-support orders in the U.S today embed roughly the same price-setting mentality.

Family courts set child-support payments based on complex administrative-economic formulas. The resulting administrative-economic price (payment amount) is presumed to be valid for the next twenty-one years. A burdensome court procedure is required to change the child-support price. Making the whole apparatus even more economically absurd, the totalitarian Bradley amendment outlaws any retroactive changes in accrued child-support debts. Hence, if you’re under a child support order and you’re imprisoned, you better get a child-support modification form filed quickly, or you could be imprisoned again for child support debt accrued while you were in prison.

Rigidity in child-support orders is a significant economic problem. Only about a third of child-support orders are ever modified. Among child-support orders in force for more than ten years, 57% have never had the amount of the order changed. Most child-support orders are never modified. That doesn’t mean that they remain economically appropriate for eighteen or twenty-one years.

Real, vitally important economic circumstances change rapidly. For example, from early 2007 to early 2009, the unemployment rate in the U.S. rose from about 5% to about 10%. Among persons subject to child-support orders, 10% have family income below the poverty level, 11% didn’t work in the past month, and 21% have two or more children living with them in their household. At population medians, child support orders account for an estimated 14% of gross family income. Housing, for comparison, consumes 24% of gross family income. In response to changing economic circumstances, persons usually can change their housing costs more easily than they can change their child-support payments.

Child-support orders cast in iron undoubtedly create great hardships for adults and children living in the rapidly changing circumstances of the real world.

* * * * *

Statistics: child-support modification frequency and financial circumstances of child-support payors (Excel version)

building a data search engine

Suppose you want to find data relating to a specific topic. Software can easily distinguish between alphabetical text and numbers. Software can also easily recognize tabular formats in HTML and plain text. Moreover, the extent to which documents contain numbers and tables strongly distinguishes among documents. “Look for data” seems like a significant search qualifier that wouldn’t be costly to implement. So where are the data search tools?

Zanran is search engine designed to find data. The only way to tell Google that you want to search for data is to specify a search for .xls (Microsoft Excel) documents. That’s a poor specification for a data search, especially for a company that’s not Microsoft. With Zanran, you specify that you want to search for data simply by using Zanran for the search. Zanran returns documents containing data. Unlike Datafiniti, Zanran doesn’t attempt to extract the data into a tabular form. Unlike WolframAlpha, Zanran doesn’t attempt to do calculations with data that it finds. Zanran finds the relevant documents. You extract the data and make the specific tables or calculations that you want. At least until the forthcoming world-wide implementation of linked data, Zanran’s approach is the most cost-effective division of tasks for using data from the whole web of various document types.

A stand-alone data search engine unfortunately doesn’t seem economically propitious. Building a data search engine involves all of the challenges of building a general search engine.[*] Search for data is mainly a tuning of the relevance-ranking algorithm. Even without such a tuning, a Google search for historical advertising expenditure leads to better data than a similar Zanran search. Moreover, searches for data don’t provide context for lucrative advertising. Persons searching for data aren’t looking to buy mass-market consumer products. Perhaps Zanran or similar services can succeed on a subscription or pay-per-search basis. For the sake of fact-based policy and business analysis, let’s hope so.

* * * * *

[*] Zanran analyzes images to determine if they contain a graph, chart, or table. Thus a Zanran search can potentially return a desired graph or data that a purely textual search would miss. That’s valuable, but searching images for numerical data and data presentations seems to me to be a rather small share of the overall value of data search to users.

Month: December 2011

American University Library moving quickly to electronic resources

the history of paratextual organization

child-support administration ignores economic reality

Wednesday's flowers

building a data search engine