As more and more information gets stored in large databases, knowing how to navigate it effectively is an important skill.
Cross posted from Pruning Shears.
About a year ago the Ohio Department of Natural Resources (ODNR) had an information session in Portage county about a recently permitted group of injection wells. The presentation by Tom Tomastik, ODNR's lead geologist for the injection well regulatory program, wasn't very helpful, but other ODNR personnel offered some useful assistance.
I spoke with ODNR's Tom Hill about the difficulty in finding inspection reports, and he basically said requests needed to be made on an individual basis; there was no way to go online and browse them. One of his colleagues referenced a downloadable database called RBDMS, but didn't give a location or other information. I should have pursued that a little further, but what I had been able to find at the ODNR site to that point had been sufficiently difficult to use that I didn't think it would be worth it. A few weeks ago, though, a friend sent me a link to RBDMS, and it turns out that it's very helpful - if you know how to use it.
The Risk Based Data Management System (RBDMS) database tracks all of ODNR's inspections, includes each well's information and history, and can be downloaded free at ODNR's site. This is actually a very good and useful thing for ODNR to make available to the public. Good on them for doing it (insert golf clap sound effect). The problem, and this obviously is not ODNR's fault, is that there's a lot of data. If you want to look at it yourself, download the setup program - or just download the Microsoft Access database if you have Access on your PC and know the right version to grab - and then download the weekly update. Create the directory C:Rbdms on your hard drive and put everything there.
The weekly update is around 200 MB as a zip (compressed) file, and over 1 GB uncompressed. Again, lots of data. The "setup" database is the one presented for public use, and it links to the weekly update database. The weekly update has the raw data. Use the setup DB for querying, overwrite the update DB as needed, and the new data will be visible to the setup DB. I didn't find the setup DB very useful because I was interested in inspection reports, and couldn't find an interface for that.
In the weekly update database (Rbdmsd97.mdb) there are three tables - "well," "wellHistry" and "tblInspection" - recording the relevant data. Making sense of it is another matter, though. Microsoft Access is what's called a relational database management system (RDBMS, an unfortunately similar acronym to RBDMS), which means it structures data by relationships between tables. Think of each table as a spreadsheet. A business might have, say, a Customers table with one entry for each customer. Typically each entry has a unique identifier, often an auto-number field that exists solely to uniquely identify the record. So the first entry in the table for, say, Company X would get assigned Customer ID 1.
Then you might have an Orders table, where each customer's order is placed. The order will have, say, quantity, price, and other data - and the Customer ID number from the Customer table to identify who made the order. There will be a one-to-many relationship between the tables because one customer can have many orders. Storing only the Customer ID in the Orders table - and not the customer name, address, etc. - is a more efficient way to store the data. There's no need to repeat customer data in the Orders table, which would only make the database larger and slower, when you can just set up a relationship between the two tables and use the unique Customer ID to identify who placed the order.
Unless you are very familiar with the tables, though, you won't know that the Company ID of 1 in the Orders table corresponds to Company X is the Customers table. It's possible to set up querying and reporting that will join the appropriate fields and present the data in a more intuitive way, but you need to know how to join the tables in the first place to make that happen.
That's the challenge to anyone looking at the weekly update database. For instance, say you want to look at wells in Portage county. The very first well in the Well table has API well number 34001200010000 and 1 in the County field (CNTY). What is county 1? You need to go to the County table, and it turns out County 1 is Adams (Portage is 133). So if you want to look at inspections for Portage county wells, you need to create a query joining the API number from the Well table to the API on the Inspection table, and also joining the county field from the Well table to the County table. Then run the query where county name equals "Portage." Nothing to it, right?
That process needs to be repeated for every related field you'd like meaningful information on, like inspection type and inspection purpose. It's a complicated process. Having this kind of large, structured data available is a relatively new thing, and citizens (or investigative journalists) who want to navigate it need to develop the skill set. It's very different from the skill set required for processing unstructured data like a document dump in response to a FOIA request.
Being able to navigate an RDBMS effectively allows one to query, filter and flag meaningful items. It provides a way to go through a large database and find the information one is looking for. I'm still getting my arms around RBDMS, but here's an example of something I saw flagged as I was spot checking my work. It's from well API 34167296850000 (NEWELL RUN DISPOSAL WELL(SWIW #10) in Washington county). Inspection date 1/26/2010, inspector 2164 (Cynthia Van Dyke), emph. added because holy crap:
Tested tubing overnight. Left 2000# on tubing overnight, next morning was 1700#. Service rig on location. OOG worried about lose, Tom Tomastik suggests just wait until it fails integrity. This day pressured up tubing back up to 2000#, held there for 15 min. with only 11# drop. Will continue to use.
The inspection did not get marked for a violation or significant non-compliance. Sure seems like it should have though. I plan to keep going through the database and hope to report on other interesting findings. Anyone who wants to do their own research can get in touch and I'll help out as I can; no need to reinvent the wheel.
An effort like this seems very important to me - the "wait until it fails" model of regulation doesn't seem like a winner, and I'd like to see other examples highlighted. They are buried in data, though, and ferreting them out requires new tools and new talents. Activists take note.