Monday, April 2, 2012

Sunday, April 1, 2012

Right about the database, wrong about the world

image

Tina Rosenberg at Foreign Policy writes about Patrick Ball, a statistician who specializes in precisely accounting for deaths from wars and other atrocities. Her article is especially worthwhile because of its discussion of the complexities of finding the truth -- as opposed to reaching for whatever fragment of evidence is closest at hand.

Here's Ball on the uprisings in the Middle East:

"I get a tingle the way everyone does when I see video from a cell phone from the Arab Spring," Ball told me recently. "But let's not forget that most human rights violations are in secret -- no cell phone. It's easy to think the world is totally different because we have cell phones now. It has changed our understanding of public violence -- but most violence isn't public."

For Ball, it's such unknowns that led to his most important early insight: "You can do precise statistics about what's in your database," he says, "and may be completely wrong about the world."

Thursday, March 1, 2012

Jer Thorp: "Start to think about data in a human context"

Thorp is the data artist in residence at The New York Times:

"By placing data into a human context, it gains meaning, and I think this is tremendously, tremendously important."




via @MacDiva

Monday, December 12, 2011

Nicholas Lemann: Journalists should do literature reviews

 

image

The Columbia journalism school dean, in a Q&A by the Journalist's Resource:

Every summer, they let me out of the cage and I can do one reporting story for The New Yorker. The one I did last summer just appeared, on Brazil. I try to put into effect in my own journalism the things I’ve been trying to put into effect at the school. So the very first thing I do when I get an assignment like that is to do what academics call a “literature review,” which is partly done through reading and partly done through meeting leading academic experts on the subject, and just kind of familiarizing myself.

A lot of journalists feel pretty comfortable reviewing the literature — we don’t use the term “literature review” — of works of journalism, but not of works of scholarship and research. You can, with some training, do a literature review, by the way, inside a daily news cycle even. But to break down that barrier and show journalists how to get to and understand and use quickly the body of academic research is really, really useful in terms of getting context. Its value is meaningfully beyond the now-ancient idea of going to the newspaper morgue and pulling the clips. That’s how we were trained when I was a kid. You’d go to the morgue and pull the newspaper clips, and you’d — quote — call an expert. But that’s different from actually reading the literature and figuring out who the leading voices are and reading their work in its original academic form, without fear; and then really sitting down and trying to spend time with them, as opposed to just calling them blind and saying, “I need a quote.” So I do this myself, I teach my students how to do it, and they do it. It changes and enriches the way you work.

Wednesday, October 26, 2011

Biologist E.O. Wilson: 'I've ridden the ants the whole way'

image

From a profile in The Atlantic of biologist E.O. Wilson, who in his 80s is challenging the accepted wisdom about kin selection:

This is hardly the first scientific controversy surrounding Wilson. An even bigger fight erupted around him in the 1970s, as he laid out his ideas on sociobiology in three landmark books, The Insect Societies, Sociobiology, and On Human Nature. At issue throughout were his claims that our genes not only are responsible for our biological form, but help shape our instincts, including our social nature and many other individual traits.

These contentions drew fierce criticism from all across the social sciences, and from prominent specialists in evolution such as Wilson’s late Harvard colleague, Stephen Jay Gould, who helped lead the charge against him.

Wilson defined sociobiology for me as “the systematic study of the biological basis of all forms of social behavior in all organisms.” Gould savagely mocked both Wilson’s ideas and his supposed hubris in a 1986 essay titled “Cardboard Darwinism,” in The New York Review of Books, for seeking “to achieve the greatest reform in human thinking about human nature since Freud,” and Wilson still clearly bears a grudge.

“I believe Gould was a charlatan,” he told me. “I believe that he was … seeking reputation and credibility as a scientist and writer, and he did it consistently by distorting what other scientists were saying and devising arguments based upon that distortion.”


The author, Howard W. French, writes that Wilson's idea that genes shape our nature has become so mainstream that it's difficult "to see what much of the fuss of the 1970s was about."

[Via Statistical Modeling, Causal Inference, and Social Science]

Tuesday, October 25, 2011

A look at 'The Media's Gas Problem'

image

S. Robert Lichter at STATS.org looked at two academic studies on the impact of fracking -- one the media embraced that asserted fracking could worsen greenhouse emissions, and another the media mostly ignored that drew the opposite conclusion. He notes that "journalists who alert the public to dangers in their midst win awards; journalists who debunk overhyped scares do not."

... the media's treatment of scientific studies should be treated as a kind of rolling health scare, a structural imbalance based on a selection bias that is unlikely to change anytime soon. So what are news consumers to do in the short run? Just remember, the most deceptive lead in science journalism is, "A new study shows''

Monday, October 24, 2011

Analyzing Jill Abramson's 'nasal car honk'

image

The Language Log analyzes the voice of New York Times executive editor Jill Abramson, a voice that Ken Auletta described as the "equivalent of a nasal car honk":

The ratio between the fundamental and the infrasonic modulation is variable -- it seems to be more like 8-to-1 towards the end of this sample -- but the general pattern remains the same. Her long/low/loud phrase endings also often shift into "vocal fry", which is a kind of chaotic oscillation ...

New York magazine asserts Abramson "Will Never Live Down Her Voice, Dog or Tattoo."

One of the commenters on that post argues that if Auletta were "profiling a man in this position, he wouldn't have dared remark on any of these 'traits' (unless he's a f*g ie., Truman Capote), aware that doing so would put a bull's eye target on his back for bitchiness."

Sunday, October 23, 2011

Punctuation as rhetoric

image

The author of the upcoming book, "The Language Wars: A History of Proper English," sees an increasing tendency "to punctuate for rhetorical rather than grammatical effect":

How might punctuation now evolve? The dystopian view is that it will vanish. I find this conceivable, though not likely. But we can see harbingers of such change: editorial austerity with commas, the newsroom preference for the period over all other marks, and the taste for visual crispness.

Saturday, October 22, 2011

Philip Meyer lectures on 'evidence-based narrative'

image

For decades professor Philip Meyer has argued, with little success, for greater use of social science methods in journalism. In a recent lecture, he doubles down and adds narrative journalism to the mix:

Both genres, narrative journalism and precision journalism, are special forms requiring special skills. If we were to blend the two, what should we call it? I like the term "evidence-based narrative." It implies good storytelling based on verifiable evidence.

Yes, that would be an esoteric specialty. But I believe that a market for it is developing.

Wednesday, October 19, 2011

Krugman: Academic journals don’t spread economic ideas

image

Paul Krugman says refereed academic journals were never the main means by which new economic ideas were spread:

… the system never worked like that — or at least not in my professional lifetime. And when you consider how economic discussion actually used to work, you see the blogs in a different and more favorable light.

First of all, policy-oriented research was never as centered on refereed journals as we liked to imagine. A lot of the discussion always took place via Federal Reserve and IMF working papers, and even reports from the research departments of investment banks. …

Second, even for more academic research, the journals ceased being a means of communication a long time ago – more than 20 years ago for sure. New research would be unveiled in seminars, circulated as NBER Working Papers, long before anything showed up in a journal. Whole literatures could flourish, mature, and grow decadent before the first article got properly published ... The journals have long served as tombstones, certifications for tenure committees, rather than a forum in which ideas get argued.

What the blogs have done, in a way, is open up that process. Twenty years ago it was possible and even normal to get research into circulation and have everyone talking about it without having gone through the refereeing process – but you had to be part of a certain circle, and basically had to have graduated from a prestigious department, to be part of that game. Now you can break in from anywhere; although there’s still at any given time a sort of magic circle that’s hard to get into, it’s less formal and less defined by where you sit or where you went to school.

Monday, October 17, 2011

Jeff Jarvis exposed

image

Evgeny Morozov, the author of “The Net Delusion,” does a thorough takedown of media guru Jeff Jarvis and Jarvis’ new book, “Public Parts,” in a review in The New Republic:

WHY SUCH NARRATIVES are in demand by the general public is more mysterious. It could be that ordinary people find the surreal perplexity of the Internet—the stuff of WikiLeaks, Anonymous, Stuxnet, “Twitter revolutions”—so maddeningly complex and labyrinthine that they are ready to settle for whatever theory or pseudo-theory or theoretical uplift seems to make sense of the puzzling new situation. And what better way to make sense of it all than to claim that the source of their perplexity is in fact a part of some inexorable historical process that has been unfolding for centuries? Most Internet intellectuals simply choose a random point in the distant past—the honor almost invariably goes to the invention of the printing press—and proceed to draw a straight line from Gutenberg to Zuckerberg, as if the Counter-Reformation, the Thirty Years’ War, the Reign of Terror, two world wars—and everything else—never happened.

The ubiquitous references to Gutenberg are designed to lend some historical gravitas to wildly ahistorical notions. The failure of Internet intellectuals actually to grapple with the intervening centuries of momentous technological, social, and cultural development is glaring. For all their grandiosity about technology as the key to all of life’s riddles, they cannot see further than their iPads. And even their iPad is of interest to them only as a “platform”—another buzzword of the incurious—and not as an artifact that is assembled in dubious conditions somewhere in East Asian workshops so as to produce cultic devotion in its morl fortunate owners. This lack of elementary intellectual curiosity is the defining feature of the Internet intellectual. History, after all, is about details, but no Internet intellectual wants to be accused of thinking small. And so they think big—sloppily, ignorantly, pretentiously, and without the slightest appreciation of the difference between critical thought and market propaganda.

Jarvis responded to Morozov on his blog, BuzzMachine.com, and on Google Docs, claiming it was “a personal attack” and “character assassination,” which is pure BS. It wasn’t.

I will always remember Jarvis as the journalism professor who wrote his previous book, “What Would Google Do?”, without actually reporting on his subject. This is how Jarvis explained it in the acknowledgments: “I did not seek access to Google for this book because I wanted to judge it and learn from it at a distance.”

Morozov criticizes Jarvis for being intellectually lazy, and that’s a fine example of it.

Sunday, October 16, 2011

How much traffic do you get from a mention by a famous author in a USA Today op-ed?

image

Short answer: Not much.

On Monday, author Joe McGinniss cited numbers from my anonymous source tracker in a USA Today op-ed to justify his use of anonymous sources in his book on Sarah Palin.

When I first learned of it from a tweet by USA Today reporter Gregory Korte, I thought: Well, that should generate some traffic to the website.

On a normal day, the source tracker generates at most a few dozen unique visitors a day, often less.

It settled into that pattern almost immediately after I introduced it in February 2010, when the site had 60 visitors two days after launch before they dropped off. The peak came in May 2010 when the number of visitors spiked at 254 after ESPN's ombudsman mentioned it online.

Then along comes McGinniss.

The day the column ran in the paper and online, a Monday, the source tracker had 54 unique visitors, falling off immediately after that to normal levels.

I'm thinking if I want to get more attention, I'll have to write a book.

Saturday, October 15, 2011

Jack Shafer: 'How to think about plagiarism'

image

Fighting it requires editors with organs stitched from dead skin:

An editor must have a heart like leather. Not freshly tanned leather—all supple and yielding like a baby’s bum—but like an abandoned baseball glove that’s been roasting in the Sonoran Desert for five or six years. Only those who are hard of heart can properly deal with the plagiarists who violate the journalistic code.

Friday, October 14, 2011

‘Be innovative, but not so much that people can’t easily follow’

image

Reg Chua of Thomson Reuters says developers of news web applications need to think more deeply about how readers experience data:

Too many news apps – or visualized databases, or whatever we call them – consist of throwing up data online with some kind of search or exploratory interface.  It’s helpful to be able to pick through details, of course, but often they aren’t designed with the user’s needs foremost.

... we haven’t yet developed broadly-accepted conventions of how to explore data – so non-geeks (and even geeks) have to learn how to use each app individually.  Imagine if every narrative story employed different conventions – you’d have to adapt to every story, and reading would quickly become a chore.  Another analogy:  Try watching an old movie, and see how slow it feels compared to the speed of editing cuts in the post-MTV world.  Viewers got used to new visual conventions over time – but it took time.  Trying to get a 1930s audience to follow the jump cuts in “Breathless” would be tough.

Thursday, October 13, 2011

Word clouds: The 'mullets of the Internet'

image

I've had a longstanding hatred of tag clouds, so it's always heartening to read a well-done takedown of these "mullets of the Internet” by Jacob Harris of The New York Times:

So what’s so wrong with word clouds, anyway? To understand that, it helps to understand the principles we strive for in data journalism. At The New York Times, we strongly believe that visualization is reporting, with many of the same elements that would make a traditional story effective: a narrative that pares away extraneous information to find a story in the data; context to help the reader understand the basics of the subject; interviewing the data to find its flaws and be sure of our conclusions. Prettiness is a bonus; if it obliterates the ability to read the story of the visualization, it’s not worth adding some wild new visualization style or strange interface.

Of course, word clouds throw all these principles out the window.

'We're still thinking printers'

image

Chase Davis and Matt Wynn make the case that newspapers should "take news apps out of the newsroom and create products instead of content":

Most news apps are still largely subordinate to the narrative story. They're coupled to the news cycle. From a revenue standpoint, their contribution is to draw eyeballs. Interest peaks on launch day, and a few days later they're all but dead, fodder for the rare user that stumbles over them. Sound like any other content you've seen?

Monday, May 23, 2011

The 'digetic wit' of Muppeteer Jim Henson

The creator of the Muppets, Jim Henson, was also an experimental filmmaker. In 1967 he made this curious short film with the composer of the Looney Tunes to advertise IBM's MT/ST, said to be the first machine to be called a "word processor":

Thursday, April 21, 2011

How to learn what databases a government agency keeps

image

Some tips gathered from old posts on NICAR-L and notes I've kept. This is for when looking at Data.gov or other more obvious sources fails:

  • Submit an open records request asking for any or all documents containing an inventory of the agency's databases or information systems.
  • Do a site-specific Google search of the agency's website for data-related keywords.
  • Look at forms agencies use to collect information. Typically the information gathered in the forms goes into a database of some kind. Some governments publish lists of forms issued by agencies.
  • If you have a document that looks like it was printed from a spreadsheet or database, it probably was. Ask for it in electronic form.
  • Look in government regulations where databases or data collected may be mentioned. On the federal level this would be the Code of Federal Regulations or the Federal Register. Similar records exist for state governments.
  • Look at enabling legislation passed by the legislature or ordinances passed by a local government body. They may set out what kind of information the agency is required to collect.
  • Annual reports to the legislature may mention databases and data collected by the agency.
  • Look at policy manuals on how government information is to be collected and used.
  • Look at the agency's self-performance assessments for data sources used.
  • Look in the methodology section of government audits of the agency for mentions of databases that were examined.
  • Lawsuits against the agency may reveal databases or data collected in documents turned over in discovery.
  • Look at the sources section of annual or statistical reports from the agency.
  • Look at FOIA logs or record requests made by others to the agency. Often the office that hears appeals will publish their decisions online with a discussion of what was requested.
  • Look at record retention schedules that tell agencies how long they must keep different classes of records.

Wednesday, April 20, 2011

A data miner asks: Do data journalists pay enough attention to data quality?

image

Matthew Hurst of the Data Mining blog read a couple of recent accounts of the "process and principles of data journalism" -- one on The New York Times, the other on the Guardian -- and came away concerned there was no mention of "assessing or questioning the quality of the data employed, or its source":

I don't mean to indicate that these institutions aren't concerned with the quality of the data they report … But just as we expect accountability regarding the sourcing of information and redundancy of sources for traditional journalism, we should expect these data sensibilities from data journalists.

Hurst had also commented on the piece on the Guardian:

One of the most important roles that a data journalist should perform is estimating the quality and bias of data sets being used. The open data movement has, to some degree, spread the assumption that government data is correct.

I don't think the fact that these two examples didn't mention data quality means no attention was paid to it. Of course journalists should be worried about data quality, and I suspect journalists are all over the map on this, from obsessive concern about data's sourcing and accuracy to naïve disregard. I doubt most of the data journalists I've been in touch with over the years would make the "assumption that government data is correct." The opposite assumption seems more likely.

That said, there was a workshop on this very subject five years ago, and in one of the papers written for it, Marcus Messner and Bruce Garrison of the University of Miami wrote that they were "quite alarmed at the lack of attention given to this issue" after they examined both academic journals on journalism and such publications as the IRE Journal, NICAR Uplink, Editor & Publisher, the American Journalism Review and the Columbia Journalism Review:

From earlier research about computer-assisted reporting, various conferences and presentations in the past decade and a half, and in discussions with professionals, it was an issue that simply remained below the research radar. But it is an issue of potentially serious ramifications for journalism and for public policymaking. We fear that not enough people are aware of it or consider it serious enough to warrant more attention. The literature of journalism that focuses on journalists’ uses of databases and their problems is thin at best. References to dirty data or other database verification issues are most often made in passing, if at all. It is seldom even discussed in academic studies involving secondary analysis of databases and in situations when reliability and validity should be given attention. While there are many individual incidents described in the literature, there has not been a comprehensive attempt to analyze the journalistic problem of dirty data.

Since then I haven't seen the issue discussed much online by data journalists -- so maybe Hurst is on to something, after all.

Tuesday, April 19, 2011

The Language Log questions whether plagiarism in a judicial decision is wrong

image

Bill Poser remarking on a case out of Canada:

The Court of Appeal for British Columbia handed down a very unusual decision today that raises an interesting linguistic issue. The underlying case, Cojocaru (Guardian Ad Litem) v. British Columbia Women’s Hospital, was a medical negligence suit by the parents of a brain-damaged baby against the hospital at which it was born. At trial before the Supreme Court of British Columbia, Justice Joel Groves ruled for the plaintiffs and awarded them $5 million in damages.

Most of Justice Groves' decision was copied from the submissions of the plaintiff's attorney, Paul McGivern. 321 out of a total of 368 paragraphs were copied nearly word-for-word from the submissions, seven were a mixture, and 40 were in his own words. On appeal, the Court held 2-1 that the trial judge's copying gave rise to the apprehension that he had not seriously considered the issues and overturned the decision, remanding it for a new trial.

I don't have a definite opinion on this, but my inclination is that the court is wrong. Judges, unlike authors of fiction, are not paid to be original. If one party states the facts or the law clearly and accurately, by all means the court should make use of the work that party's attorneys have already done rather than spending time rephrasing it.

Lots of good discussion in the comments on the post, which notes that it's common for lawyers to submit orders or other documents for the judge to sign in the judge's name if the judge rules in the submitter's favor.

Monday, April 18, 2011

Twitterstream: Street fights, robot hacks, crime predictions and more

Things that caught my attention long enough in the last few weeks to note them on Twitter:

image

Journalism

  • RT @TheOnion: NYTimes' Plan To Charge People Money For Consuming Goods, Services Called Bold Business Move http://tinyurl.com/4zzbgpx
  • Like this metric for the Chicago Tribune's stories on open records law: Percent AG's office disagrees with govt agency http://goo.gl/7Xc2u
  • Facebook hires a "Journalist Program Manager." "...journalism isn't dying," the new hire says. "It's being reborn." http://goo.gl/mIxVg
  • Street Fight covers the business of hyperlocal. Intro says 2x will "celebrate" industry. How about straight reporting? http://goo.gl/lKAvD
  • Northern Kentucky professor says lamestream media pathetic for not pursuing Sarah Palin isn't Trig's mom story. http://goo.gl/PegWC
  • Bozeman Daily Chronicle quotes own asst. managing editor vouching for Greg 'Three Cups of Tea' Mortenson's honesty. http://goo.gl/A4igl

Geekery

Research

Diversions

Tools

Friday, April 15, 2011

Facebook hires 'Journalist Program Manager'

image

This is how he introduces himself:

As Journalist Program Manager, I will be leading the charge to build programs that help journalists utilize Facebook in their reporting while advocating on their behalf to improve social journalism on the platform. This includes the likes of the recently launched Journalists on Facebook Page and Facebook Journalism Meetups program, as well as resources for journalism educators, but also taking insightful feedback to product on how Facebook can be improved for journalism. I will be based in Facebook's NYC office.

Facebook's role in journalism has grown tremendously, perhaps showcased during the recent unrest in North Africa and the Middle East, and the growth is only going to continue as new products on the platform are introduced and users become even more accustomed to engaging with content and its producers. While some have proclaimed and lamented the death of journalism, I've been more fascinated with how it's evolving, especially the emergence of social journalism. And though the platform or format may change, storytelling is thriving. After all, journalism isn't dying. It's being reborn.

Thursday, April 14, 2011

'AOL … You've got fail'

Funny, sad piece from a freelancer on the periphery of the Huff Post takeover of AOL (in spite of what you may have read, it wasn't the other way around).

Wednesday, April 13, 2011

Tim Minchin's 9-minute beat poem on science

In past lives there have been times when I've wanted to be an animator and times when I've wanted to be a poet, but those art forms bore me as often as they entertain these days. Still, I loved this 9-minute animated movie, more than two years in the making, that my teenage daughter shared with me: