LoginDownload RSS NEWS |
Datajournalism comes of age with leaked Afghan war logsThe Guardian Datablog has explained how it managed to analyse and report upon over 92,000 data records first published on 25 July by the Wikileaks website as the Afghan War Diary, 2004-2010. On Datablog, award winning editor Simon Rogers described both the processes and challenges, and some of the journalistic imperatives the team were bound by. For instance, the need to protect informants named in the logs and not endanger Nato troops unnecessarily was paramount. At the same time, the number crunchers wanted to make the information as easy as possible to digest for both the Guardian's team of investigative reporters and its readers. The datablog team took an early decision not to publish the full 75mb database – said to be the "Biggest leak in intelligence history" – which was made available on Wikileaks. Instead it published two sets (full IED data and the full set of significant incidents chosen by Guardian specialists) made up from the raw data. The sheer size of the original War Diary Excel file taxes the spreadsheet programme beyond its practical limits. But with the experience of dealing with the COINS data release in June under its belt, the team built a database that made interrogating the logs easier. Though some of the records were incomplete, Rogers says the data was "well structured ... ie, events were categorised, sometimes more reliably than others." The team filtered the records into a number of different sets including IED attacks and casualties data. Web developers and graphic designers played critical roles in presenting the data – such as the locations of IED attacks between 2006-9 – in a meaningful manner. Datablog also commissioned an interactive 'front page' to help readers find their way around the 300 key events it selected. Underlining the revolution underway in news data analysis and its reporting, Rogers noted "Have we published enough? Inevitably not. Have we started to make sense of an incredibly complex dataset? We hope so." And challenging readers to add to the collective understanding that the logs can provide, he wrote "Now it's your turn. Can you help us make more sense of the raw info?" |
0 Comments Posted Leave a comment