why SEC filings don't contain semantic, queryable data

Public companies doing business in the U.S. are required to file electronically SEC-mandated public financial reports.  These reports are available on the web through EDGAR.  They are digital text documents that contain tables of data.  You cannot readily use database software to query the data or aggregate data across filings or companies.  EDGAR, in short, contains a huge, important dataset of badly structured, non-semantic public data.

These badly structured, non-semantic public data in EDGAR are one segment within a larger economy of data work.  First the data are extracted from structured, queryable databases within companies and formed into text tables for the public SEC financial reports.  Other companies then take the reports filed with the SEC, extract the public data into a useable form, and sell it.  The net result is a waste of resources and limited access to public data.

Some reasons for this inefficient data economy:

  • Companies want to control the presentation of their financial data.  Formatting the data in way that makes the data difficult to manipulate gives companies more control over data presentations.
  • Companies don’t want their data to be readily compared over time and across companies.  Publicly disclosing data in inconvenient forms effectively lessens the extent of public disclosure.
  • Managers present reports to directors and stockholders.  Presenting data to computers (via databases and APIs) is a less valued activity.

Improving the efficiency of the data economy requires weakening interests in having an inefficient data economy.

Update:  The SEC is providing machine-readable, computable financial reports.

Leave a Reply

Your email address will not be published. Required fields are marked *

Current month ye@r day *