Why we shared our analysis in 2018
Open sourcing and sharing code probably isn’t the main reason we have a statesman
GitHub organization, but some projects - especially those that use public data – seem to call for it.
Reproducible data analysis is a core part of the job; sharing scripts is often fringe. This blog? Fringe.
I’m in this headspace because it’s the end of the year and I want to make a list and Soo Oh wrote an excellent Nieman Lab prediction. Oh offers a prioritization framework and observes the show-your-work roles tend to occupy the “periphery.”
Our projects concur. It’s uncommon for one of our repositories to bring in an e-mail, a star or a fork.
Yet when we published our analysis in 2018, we believed it was worth the effort. We wanted the subjects of our stories, our readers and other developers to see it. Accuracy and transparency are at our core, and we thought these public repos support those values.
1. statesman/demolitions
Christian and Phil Jankowski charted the increase of residential demolitions in Austin using the city’s data portal. The repository shows how Christian collected data, filtered for nuances and created visualizations.
2. statesman/question-of-restraint-analysis
The American-Statesman published its multi-part Question of Restraint investigation over a series of months in 2017. Collection and analysis of deaths in police custody largely took place within databases we created with the investigative team. But Christian created a public version this year to share at a Hacks/Hackers meetup and included readme instructions to recreate the analysis or browse.
3. statesman/daycare-data-unwatched
The Statesman’s investigation of Texas day cares and regulations spanned many months, but many of the key data findings came from a few scripts and spreadsheets. We re-ran these scripts over and over while fact checking and…. The repo also contains a folder with all of the data used by the searchable database included in the project.
We wrote a lot of code in 2018 that we didn’t share, but the projects above were important enough for us to package our scripts and show our work.
If you find yourself browsing statesman
GitHub repos, notice we did update many of our GitHub projects with topics to help our own categorization and give you more clues about what our code does. Projects tagged data-visualization
are often HTML projects that were published as an interactive; the journalism
tag is used for analysis that contributed findings to an article.
I know installing and running the code in these repositories – let alone browsing them – is a significant investment of time and energy, but I would love to hear from you if you do.
Dan Keemahill - dkeemahill@statesman.com