Thursday, June 28, 2007

O.J. says "duh": New study shows juries often get it wrong

A Northwestern University statistics professor, Bruce Spencer, studied 271 criminal cases and found "that juries gave wrong verdicts in at least one out of eight cases."

To conduct the study, Spencer employed a replication analysis of jury verdicts, comparing decisions of actual jurors with decisions of judges who were hearing the cases they were deciding. In other words, as a jury was deliberating about a particular verdict, its judge filled out a questionnaire giving what he or she believed to be the correct verdict.

“Consider the analogy to sample surveys, where sampling error is estimated even though the true value may never be known,” Spencer said. “The key is replication. To assess the accuracy of jury verdicts, we need a second opinion of what the verdict should be.”

By comparing agreement rates of judges and juries over time and across jurisdictions, and even across types of cases, Spencer’s statistical analysis could give insights into the comparative accuracy of verdicts in different sets of cases.

A draft of his paper, to be published in the Journal of Empirical Legal Studies, can be found here (PDF).

The dangers of data reuse

Security expert Bruce Schneier writes about the dangers of "data reuse":

When we think about our personal data, what bothers us most is generally not the initial collection and use, but the secondary uses. I personally appreciate it when Amazon.com suggests books that might interest me, based on books I have already bought. I like it that my airline knows what type of seat and meal I prefer, and my hotel chain keeps records of my room preferences. I don't mind that my automatic road-toll collection tag is tied to my credit card, and that I get billed automatically. I even like the detailed summary of my purchases that my credit card company sends me at the end of every year. What I don't want, though, is any of these companies selling that data to brokers, or for law enforcement to be allowed to paw through those records without a warrant.

There are two bothersome issues about data reuse. First, we lose control of our data. In all of the examples above, there is an implied agreement between the data collector and me: It gets the data in order to provide me with some sort of service. Once the data collector sells it to a broker, though, it's out of my hands. It might show up on some telemarketer's screen, or in a detailed report to a potential employer, or as part of a data-mining system to evaluate my personal terrorism risk. It becomes part of my data shadow, which always follows me around but I can never see.

And since I'm still in the mood for sharing my vacation podcast diet, you should know that a 2004 recording of Schneier discussing his book "Beyond Fear" is one of the most listened to podcast at IT Conversations. The great thing about Schneier, as the title of his book suggests, is that he is not a fear monger. That's in contrast to most so-called security experts, who have a vested interest in exaggerrating threats.

Wednesday, June 27, 2007

Would you listen to Carl Bernstein?

He says:

"One of the things I've observed having been interviewed so many times is that reporters tend to be terrible listeners. They have usually decided what the story is before they do the interview, and they will choose the one which will manufacture the most controversy. But manufactured controversy is not news."
It isn't? Tell that to Nora Ephron.

Uncloaking Terrorist Networks

A reader, Valdis Krebs, who discovered this blog because of a recent post of mine about social network analysis, pointed me to a paper he published in 2002 on"Uncloaking Terrorist Networks." As he said in his email, it's an example of "A non- journalist doing computer-assisted reporting":

We were all shocked by the tragic events of September 11, 2001. In the non-stop stream of news and analysis one phrase was continuously repeated - "terrorist network." Everyone talked about this concept, and described it as amorphous, invisible, resilient, and dispersed. But no one could produce a visual. Being a consultant and researcher in organizational networks, I set out to map this network of terrorist cells that had so affected all of our lives. My aim was to uncover network patterns that would reveal Al Qaeda's preferred methods of stealth organization. If we know what patterns of organization they prefer, we may know what to look for as we search them out in countries across the world.

His data sources were newspapers like The New York Times, Wall Street Journal, Washington Post and LA Times, so it's a technique within reach of all of us. He also includes a list of other public data sources that can be used for doing this kind of analysis. "In my data search I came across many news accounts where one agency, or country, had data that another would have found very useful," he wrote in the paper. "To win this fight against terrorism it appears that the good guys have to build a better information and knowledge sharing network than the bad guys."

Andrew Keen and The Cult of the Amateur

While driving to North Carolina on my vacation last week I enjoyed listening to this podcast with Andrew Keen, the author of "The Cult of the Amateur: How Today's Internet is Killing Our Culture." Keen was also interviewed recently by NPR's Weekend Edition. Keen told NPR he is appalled by sites like YouTube, Google, MySpace and Wikipedia:

"My problem is that it fundamentally undermines the authority of mainstream media. We're seeing two things going on simultaneously: The rise of this user-generated content, which is unreliable and often corrupt, and a crisis in professional journalism, professional recorded music, newspapers, radio stations, television and publishing. And that is the core of our culture. Once we undermine the authority and expertise and professionalism of mainstream media, all we have is opinion, chaos, a cacophony of amateurs."

Military injury reports

Michael Ravnitzky pointed out on posting on FOI-L earlier this week that that you can get reports on military injuries on the Army Medical Surveillance Activity Web site. Although its an Army site, it includes Navy, Air Force and Marine reports too. "These statistics are not readily available anywhere else, so this is an important data resource," he wrote. There is in-depth information about how soldiers are injured, including the body parts most affected and eye-opening summaries such as the one for March 2007, which says that 47,671 of the 504,416 soldiers assigned to the Army - 9.5 percent - had an injury that required medical attention that month.

Friday, June 15, 2007

Save money, eliminate the designers

The regulars at VisualEditors.com are not happy with LayoutExecutive.com, which promises to teach you "the secrets that the Designers want to keep to themselves." 

LOUIS: Federal government document repository and search engine

LOUIS, "a project of the Sunlight Foundation, and an effort, to paraphrase Justice Louis Brandeis, to illuminate the workings of the federal government. Our ultimate goal is to create a comprehensive, completely indexed and cross-referenced depository of federal documents from the executive and legislative branches of government." Derek Willis of Washingtonpost.com sees it as part of "The New Competition" for newspapers.

Thursday, June 14, 2007

Database names 1.5 million farm subsidy recipients

The Environmental Working Group's Farm Bill 2007 Policy Analysis Database already claims more than 340,000 searches since its release two days ago:

For decades, American taxpayers have provided tens of billions of dollars in federal farm subsidies to some of the largest and wealthiest farm businesses in the nation. But thousands of people who benefited from the subsidy flow were shielded from public view behind layers of partnerships, joint ventures, limited liability corporations, cooperatives, and other business structures that obscured their personal subsidy claims.

Not anymore.

A new Web site, developed by the Environmental Working Group (EWG) from millions of previously unpublished USDA subsidy records and released today, provides nearly full disclosure of federal farm subsidy beneficiaries for the first time. The disclosures include individuals, sometimes numbering in the dozens, whose subsidy benefits pass through one or more plantation-scale farm businesses that produce vast quantities of subsidized cotton, rice and other crops. Many of those businesses receive millions in USDA crop subsidies each year, and according to the new USDA data, pass six-figure benefits through to many people. In many cases, these individuals have not previously had subsidy benefits attributed to them by name.

Here's Kentucky's page and here's Indiana's.

The three markers of scientific fraud

Dr. David Goodstein, the Vice Provost and Professor of Physics and Applied Physics at Caltech, explains that three motives or "risk factors" are always present in cases of scientific fraud. He says the perpetrators

  • were under career pressure;
  • knew, or thought they knew what the answer would turn out to be if they went to all the trouble of doing the work properly, and
  • were working in a field where individual experiments are not expected to be precisely reproducible.

He also says scientific fraud is "almost always found in the biomedical sciences, never in fields like physics or astronomy or geology."

The world began in 1900, Excel says

The Open Malaysia blog highlights an Excel flaw I've experienced myself: Its belief that dates began in 1900. He says this isn't a problem in OpenOffice:

Obviously if you have the money to spend, by all means, you are free to purchase Microsoft Office 2007. However please avoid the native file formats of those products if you are a Islamic historian, Renaissance archivist, Medieval coin collector, or someone who just has to work with dates prior to the 20th Century.

Microsoft offers a macro for calculating an age from a pre-1900 date. Excel User Tips also has an add-in to cope with the problem, but the site says it's inaccurate for dates before 1752 because of differences between American, British, Gregorian and Julian calendars. I don't have a clue if that's also an issue with OpenOffice or the macro.

Wednesday, June 13, 2007

Presidential campaign finance maps from the FEC

The Federal Election Commission is now offering maps that show presidential campaign contributions by state and ZIP code. You can see where candidates are raising the most money, and drill down to the individual contributor names and amounts.

A "jaw-dropping" visual demonstration

I've seen a lot of presentations and it's rare that one generates spontaneous applause, as did this demonstration of Microsoft software called Photosynth:

Tuesday, June 12, 2007

$2 billion in newspaper print ads in peril?

"The potentially abrupt reversal of fortune would be caused not merely by competition from the Internet but also by profound changes in the way consumers buy and marketers sell," Alan D. Mutter writes.

Print advertising sales for newspapers appear to be on track to plunge by $2 billion this year, which would make for the worst performance in a decade other than the disastrous period following 9/11.

The difference between this projected decline and the one after September, 2001, is that it would occur in an era of economic well being characterized by low unemployment, respectable retail sales (until this nasty April) and record highs in the stock market. The setback, if it materializes, would be unprecedented for an industry that, until recently, has been masterful at increasing its revenues in good times and bad.

Saturday, June 9, 2007

Digital Campus

... is "A biweekly discussion of how digital media and technology are affecting learning, teaching, and scholarship at colleges, universities, libraries, and museums."

"16 Awesome Data Visualization Tools"

... that Mashable says are "both visually stunning and delightfully useful."

Software to fix errors in spreadsheets

An Oregon State University press release says computer scientists there have developed a new system for correcting errors in spreadsheets:

"Most users of spreadsheets are overconfident, they believe that the data is correct," said Martin Erwig, an associate professor of computer science in the OSU College of Engineering. "But it has been observed that up to 90 percent of the spreadsheets being used have non-trivial errors in them. In fact, one auditor has said he never inspected a single spreadsheet during his entire career that was completely accurate."

Sometimes the result is a paycheck delayed or a few dollars misplaced. But often the costs or financial misrepresentations are far more serious, and companies have lost millions or billions of dollars, Erwig said, occasionally drawing notice and ridicule in the national press.

The system is called GoalDebug, "which stands for 'Goal Directed Debugging of Spreadsheets.'' What it does "is try to identify the ways that humans commonly make mistakes and then suggest what the correct approach might have been. For instance, if someone sees a figure in a spreadsheet that seems suspicious or is clearly incorrect, they can plug in the correct number, and the OSU system can suggest several programming mistakes that might have created the error – which the user can then sort through and use to identify the problem," the press release says. OSU claims the system could help companies "save billions of dollars" -- a prediction, I'm sure, that has nothing to do with that fact that it's been "licensed to a spin-off company in Oregon."

Tuesday, June 5, 2007

Monitoring employees with social network analysis

PC World writes about social network analysis software, called Metron EBA, that does "Enterprise Behavior Analysis": 

The appliance sits on the network, takes a snapshot of its users and passively monitors traffic, tracking various modes of communication such as data, voice, e-mail or IM. Metron EBA can then display which employees or groups of employees interact the most and how it relates to business productivity. For instance, one employee may be connected to several groups and serve as an unofficial liaison, unbeknownst to upper management.

...

Metron EBA also could be used to optimize business processes, conveying which groups of people work together and what it makes sense for them to work on at any given time. In addition, the product could indicate if an employee is using different communications or speaking with people outside of the normal behavior -- which could indicate malicious behavior ...

The Green Chameleon calls it a "nasty twist" on the technique's formerly benign image, while Brad Hinton says it won't help recruit and retain staff.

Using and abusing statistics on dog bites

 A blog on animal welfare issues takes issue with how newspapers report statistics on pit bull attacks:

I love statistics.  I think they can provide great insights when used correctly.

My problem with statistics is that most people are lazy -- and few ever ask the additional questions like "how did they get that number?" or "is there something else influencing that" or "is there something else to that story?"   When used correctly, statistics can shed a lot of light.  However, when someone looks at statistics to try to prove their opinion, statistics can be very scary.

As the old saying goes, use statistics as a sober person would use a lampost, for ilumination, not as a drunkard who uses it as a crutch.

Monday, June 4, 2007

Foreign lobbyist search

You can search and download data on lobbyists registered to lobby on behalf of foreign powers at the U.S. Department of Justice's Web site.

The Foreign Agents Registration Act (FARA) was enacted in 1938. FARA is a disclosure statute that requires persons acting as agents of foreign principals in a political or quasi-political capacity to make periodic public disclosure of their relationship with the foreign principal, as well as activities, receipts and disbursements in support of those activities.  Disclosure of the required information facilitates evaluation by the government and the American people of the statements and activities of such persons in light of their function as foreign agents. The FARA Registration Unit of the Counterespionage Section (CES) in the National Security Division (NSD) is responsible for the administration and enforcement of the Act.

The Hill reports that the database links "to substantial documents, such as contracts between lobbyists and foreign governments as well as advocates’ reports listing contacts between them and policymakers."

Overhyped hyperlocal?

The American Journalism Review casts doubt on the profitability of hyperlocal news:

As big-media companies and entrepreneurs alike rush into the hyperlocal arena (see "Really Local," April/May), it's worth pausing and asking: Is there a real business in this kind of business?

So far--and admittedly it's still very early --the answer is no. A few of the estimated 500 or so "local-local" news sites claim to show a profit, but the overwhelming majority lose money, according to the first comprehensive survey of the field. The survey, conducted by J-Lab: The Institute for Interactive Journalism (affiliated with the University of Maryland's Philip Merrill College of Journalism, as is AJR), documents a journalism movement that is simultaneously thriving and highly tenuous. While independent sites such as WestportNow.com (Connecticut), iBrattleboro.com (Vermont) and VillageSoup.com (Maine) have sparked useful civic debates and prodded established media outlets to compete more vigorously, the field as a whole is so far financially marginal. As the report puts it, "their business models remain deeply uncertain."