27 August 2010
Mirko Lorenz, who organised a recent data-driven journalism event through the European Journalism Centre (ECJ), believes data-driven journalism may be the future. Lorenz explained that being able to visualise filtered data alongside a good story could provide an invaluable service to the public good.
Is this really the next big thing or is it just hubris? Another miracle fix to the glum outlook of journalism? Perhaps saving journalism (and the public’s trust in the profession) requires going back to the basics of a good, well thought out, and structured story telling based on compelling research. Take Florence Nightingale. In 1858, she produced a coxcomb graph showing how more soldiers died of sickness in the Crimean War than on the battlefield. The UK Sanitary Commission subsequently improved hygiene in both military camps and hospitals. In other words, data-driven journalism is not new. And neither is good storytelling.
Nonetheless, we are faced with an information overload that makes it almost impossible to single-handedly understand what is happening. What if Wikileaks had sent you its 75 000 + documents? Both The New York Times and The Guardian had three weeks to sift through files. The Guardian’s datablog was then able to create a graphical representation of the war logs. These were then linked to the articles. It’s great stuff. But not everyone is so lucky.
Wikileaks is an exceptional case. Most often journalists have to deal with one or several unstructured excel spreadsheets, text files and pdfs. Sensitive information on budgets are, for instance, often wrangled from some official who will then typically present the data in an unstructured manner. Trying to sift through hundreds of thousands of data sets is an exercise that taxes limited resources and takes too much time. How can one possibly make sense of a text file with millions of figures? And what exactly is one looking for?
“Always question government stats,” says Financial Times investigative journalist, Cynthia O’Murchu, adding that data is rarely user friendly. Ms O’Murchu, who was present at the EJC conference, expressed her frustration at having first to clean data before any attempt to decipher it. Even getting access is a quest. She demonstrated how her request to obtain UK government data through the Freedom of Information Act would come with a GBP3,400 price tag and months of delay. Information is not so free, after all.
Another tactic used by government officials is to dump massive amounts of unstructured raw data. This is known as data dumping, an exercise that New York Times interactive news technology whiz, Alan McLean, said not only fails to inform but is also distracting. Thankfully, there exist a number of free online tools that claim to make reading and finding narratives in data much easier:
Richard Rogers, a professor at the University of Amsterdam, compiled a list of 35 different online tools to help you make sense of data. He demonstrated one tool called the Lippmannian device – named after Walter Lippman.
This tool does two things: it can generate source clouds or issue clouds. Source clouds enable one, for instance, to find out which sites most often quote a particular source. So if you enter the terms ‘climate change sceptics’ you can visualise which sites source sceptics the most. Issue clouds enable you to establish quickly what a particular organisation actually does as opposed to what it says it does. For instance, you can find out the main issues of the top 50 human rights organisations and quickly visualise their relative differences. The Lippmannian device is a nifty little tool. Incidentally, Google is not too keen about it and is beginning to muscle-in on the operation.
The Guardian database
Simon Rogers is the editor of the Guardian’s Datablog and Datastore. This online data resource publishes hundreds of raw datasets and encourages users to visualise and analyse them. There are around 800 datasets currently available and they add three or four every day.
Tony Hirst runs this blog. He’s a lecturer at the Open University. The blog is pretty detailed but provides how-tos on taking data from a website and visualising it in Google Map without having to code anything. You can then send your Google Map URL to anyone. This is a great feature if you want to locate poverty indexes using Eurostat datasets. You could, for instance, see the relative differences and intensity of poverty in Google Maps. Note, I haven’t tried doing this but he does provide a step-by-step guide.
Frank van Ham was one of the developers of this useful tool that was quickly bought up by Google. His two partners now work there. Many-eyes allows you to visualise data and place it online for all to see and comment on. It’s interactive and enables others to manipulate and analyse it. Yes, there are people who actually do this for fun. Your graphic gets a direct URL that you can then share as well. Or you can also embed the graph into your own website.
Hackhackers and Storify.com
I’d also like to make short mention of this one. Hackhackers is developed by the former AP foreign desk bureau chief Burt Herman. Burt describes his tool as combining computer science with journalism – a still distant concept here in Europe but common in the United States. He took the concept and applied it to tool he calls storify.com. Storify will allow you (still in development) to make sense of twitter feeds and may be even find a story. Anyway, the guy was super interesting.
This tool wasn’t presented at the conference. But I found out about it from a German journalist who uses it to better analyse Eurostat data. Quantum GIS allows you to visualise, manage, edit, analyse data, and compose printable maps. Sounds good.
Nikolaj Nielsen is a Brussels-based independent journalist. Published here with the author’s permission. ©Nikolaj Nielsen.