
Matthew Hurst of the Data Mining blog read a couple of recent accounts of the "process and principles of data journalism" -- one on The New York Times, the other on the Guardian -- and came away concerned there was no mention of "assessing or questioning the quality of the data employed, or its source":
I don't mean to indicate that these institutions aren't concerned with the quality of the data they report … But just as we expect accountability regarding the sourcing of information and redundancy of sources for traditional journalism, we should expect these data sensibilities from data journalists.
Hurst had also commented on the piece on the Guardian:
One of the most important roles that a data journalist should perform is estimating the quality and bias of data sets being used. The open data movement has, to some degree, spread the assumption that government data is correct.
I don't think the fact that these two examples didn't mention data quality means no attention was paid to it. Of course journalists should be worried about data quality, and I suspect journalists are all over the map on this, from obsessive concern about data's sourcing and accuracy to naïve disregard. I doubt most of the data journalists I've been in touch with over the years would make the "assumption that government data is correct." The opposite assumption seems more likely.
That said, there was a workshop on this very subject five years ago, and in one of the papers written for it, Marcus Messner and Bruce Garrison of the University of Miami wrote that they were "quite alarmed at the lack of attention given to this issue" after they examined both academic journals on journalism and such publications as the IRE Journal, NICAR Uplink, Editor & Publisher, the American Journalism Review and the Columbia Journalism Review:
From earlier research about computer-assisted reporting, various conferences and presentations in the past decade and a half, and in discussions with professionals, it was an issue that simply remained below the research radar. But it is an issue of potentially serious ramifications for journalism and for public policymaking. We fear that not enough people are aware of it or consider it serious enough to warrant more attention. The literature of journalism that focuses on journalists’ uses of databases and their problems is thin at best. References to dirty data or other database verification issues are most often made in passing, if at all. It is seldom even discussed in academic studies involving secondary analysis of databases and in situations when reliability and validity should be given attention. While there are many individual incidents described in the literature, there has not been a comprehensive attempt to analyze the journalistic problem of dirty data.
Since then I haven't seen the issue discussed much online by data journalists -- so maybe Hurst is on to something, after all.