mountains of telecom data for crowd fun

Huge archives of files containing U.S. local-exchange telephone companies’ service volumes, rates, and revenue from 1992 to 2009 are now available for collaborative reformatting, organizing, and analyzing.  U.S. local-exchange telephone companies that the Federal Communications Commission regulates via price caps publicly file annual tariff review data.  These data include service volume (demand) and rates for every interstate access services provided under price-cap regulation.  The data also include various aggregations of service revenues as well as price indexes used within price-cap regulation.

The source archives consists of standardized (by year) Tariff Review Plans (TRPs) and ad-hoc rate detail files.  The files are specific to a filing year (1992-2009) and a price-cap-regulated telephone company service area. The filings consist of annual tariff review filings (usually from about June 15 of a given year), as well as some additional filings (revisions, restructurings) in the these formats.  Some filings for years prior to 2003 could not be located.  The original source files are in Lotus 1-2-3 .wk3 and .wk4 format.  The price-cap archive contains 2,946 Lotus 1-2-3- files (with a few other format files) and has a total uncompressed size of 4.04 gigabytes.  The rate-detail archive contains 2,473 Lotus 1-2-3 files and has a total uncompressed size of 1.39 gigabytes.

I’ve already made some of the data much more accessible, organized it, and done some analysis that illustrates its use. I organized and categorized all the rate elements for Bell Atlantic and all the rate elements for US West from 1990 to 2009 and put them into tab-delimited text datasets.  I also created a tab-delimited text dataset of a section of the Tariff Review Plans (TRPs) for thirteen large,  historical telephone company service areas from 1992 to 2009.  You can find some of my analysis of the data in this blog’s network connectivity category.

Much useful work remains to be done with the data.  One important task is to make all the source data more easily and universally accessible.  Neither Calc nor Microsoft Excel 2007 (nor the forthcoming 2010 Excel product) read Lotus 1-2-3 .wk3 and .wk4 files.  Microsoft Excel 2003 does open the files.  Lotus 1-2-3 can still be purchased, now at a suggested retail price of $313.  Of course, any conversion of the source files is likely to lose some data.  Note that the archives I’ve created are themselves neither official nor authoritative. A useful format conversion for the data could aim only for the modest objectives of making the data more publicly accessible for exploratory analysis and for stimulating informed discussion of telecom policy.

The data would be more useful if it were better organized.  Structuring existing fields into records by company and year would enable many useful queries.  By following the models of the datasets I’ve already set up, anyone could make a small contribution by doing similar work for a small subset of companies and years.  Such individual contributions could easily be aggregated.  Since comparisons across companies and across years contributes to insight, individual contributions would be much more valuable in the aggregate.

Figuring out  and administering a good structure for managing the archives and contributions of work is also an important task.  This task seems similar to that of running an open-source software project. The success of open-source software projects indicates both that the task is feasible and that expertise exists in doing it.

Many persons complain about telephone companies and criticize government regulation. Here’s an opportunity for these persons and anyone else to contribute to understanding better telephone companies and government regulation of these companies.  Many hands together could make quick work of reformatting, organizing, and analyzing these huge archives.

Leave a Reply

Your email address will not be published. Required fields are marked *

Current month ye@r day *