Data-driven journalism is the future. Journalists need to be data-savvy. It used to be that you would get stories by chatting to people in bars, and it still might be that you’ll do it that way sometimes. But now it’s also going to be about poring over data and equipping yourself with the tools to analyze it and picking out what’s interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what’s going on in the country.

— Tim Berners-Lee, founder of the World Wide Web

Journalists don’t have time to waste transcribing things by hand and messing around trying to get data out of PDFs, so learning a little bit of code, or knowing where to look for people who can help, is incredibly valuable.

“One reporter from Folha de São Paulo was working with the local budget and called me to thank us for putting up the accounts of the municipality of São Paolo online (two days work from a single hacker!). He said he had been transcribing them by hand for the past three months, trying to build up a story. I also remember solving a ‘PDF issue’ for ‘Contas Abertas’, a parliamentary monitoring news organisation: 15 minutes and 15 lines of code solved a months worth of work.”

— Pedro Markun, Transparência Hacker

Challenges in Obtaining Data

With the above insights it’s clear that there is a dire need for data in Journalism, however there are quite a few challenges in collecting and collating such data.

  1. Finding Stories with genuine source is a big challenge considering the fact that the internet is filled with user generated content which cannot be relied upon
  2. A lot of such information is only available for a short period of time and then taken down.
  3. Raw data is often not available to download from Trusted Sources
  4. You probably have to do a lot of manual work that it’s impossible to do in a profession which is driven by speed.

Web Scraping & Data Mining

If you know how to code, the whole process becomes much easier and you can generate huge amount of consumable data in few minutes or hours which would have taken days or months otherwise. Here is how it can help.

  1. Help you get your Story out first in the media.
  2. Easily gather and analyse reviews, surveys, reports, industry insights etc and create data points to add substance in your stories.
  3. Automate repetitive work so that you don’t have to search for information, instead you are pushed with information coming from different sources.

We have built a tool to help with exactly that. 8om Extract helps you create web crawlers and code that can obtain information available on the internet and help you automate those tasks. I think you should definitely check this out and share feedback which would help us improve even further!

