Depth Reporting

Showing posts with label Statistics. Show all posts
Showing posts with label Statistics. Show all posts

Thursday, July 10, 2008

Free book on data wrestling

Paul Murrell, a senior lecturer in statistics at New Zealand's University of Auckland, has published a working draft of his upcoming book, opaquely titled "Introduction to Data Technologies," online. There's also a PDF you can download. The book, written for academics but potentially useful to geeky, data-oriented journalists, discusses how to work with HTML, CSS, XML, databases, SQL and R. From the introduction:

The basic premise of this book is that scientists are required to perform many tasks with data other than statistical analyses. A lot of time and effort is usually invested in getting data ready for analysis: collecting the data, storing the data, transforming and subsetting the data, and transferring the data between different operating systems and applications.

Many scientists acquire data management skills in an ad hoc manner, as problems arise in practice. In most cases, skills are self-taught or passed down, guild-like, from master to apprentice. This book aims to provide a more structured and more complete introduction to the skills required for managing data.

The focus of this book is on computational tools that make the management of data faster, more accurate, and more efficient. The intention is to improve the awareness of what sorts of tasks can be achieved and to describe the correct approach to performing these tasks and there is an emphasis on working with data technologies via written computer languages.

[via Statistical Modeling, Causal Inference, and Social Science]

Saturday, July 5, 2008

In search of hard numbers on mass newspaper layoffs

Last week an anonymous reader commented on my post about the newspaper layoff map:

All this fuss over some layoffs. What a bunch of cry babies. Now that the newspapers are going through what the rest of us have been dealing with for decades all of a sudden its big news. There are hardly any news articles about US companies replacing American workers with 65,000 H1B visa temporary workers each year and that is not even counting the L1 visas. Journalists callously wrote articles about how poor foreigners need those jobs. Well where are the articles about how the outsourcing in the news industry is good and helps poor immigrants get jobs. As far as I'm concerned, the current downsizing/outsourcing going on in the news industry is some much needed bitter medicine for the out of touch media.
Journalists do come across as oblivious to the economic reality faced by most other workers. Layoffs are common in other industries, even in the allegedly fast-growing and dynamic high-tech world. Companies like Sun Microsystems, Motorola and Yahoo! have laid off workers in the last year. And in many industries, workers aren't offered expensive buyout packages, as most newsroom employees seem to be getting. They're just let go, period.
Newspaper layoffs don't show up in the mass layoff statistics kept by the Bureau of Labor Statistics. Last month, after reading about The Palm Beach Post cutting 130 jobs from its newsroom, I looked there for signs of a more general surge in big newspaper job cuts and couldn't find any. The bureau releases the data monthly, with the most recent release for May, so presumably large newspaper layoffs would show up there. I couldn't find data sliced as finely as I wanted it, however, so its possible a trend is visible in other numbers I didn't have access to.
The bureau classifies industries using the North American Industry Classification System. There is a category, 511111, for newspaper publishers, but narrowest data grouping I could find online was a breakdown by "Publishing Industries (except Internet)." That isn't on point because it includes not only newspaper publishers but also magazine, book, directory, software and greeting card publishers.
Here's what the yearly figures since 1996 look like in a Google Chart. I assume the spike in 2001 reflects the dot-com bubble bursting.

Another issue is that the definition of a mass layoff -- "Fifty or more initial claims for unemployment insurance benefits filed against an employer during a 5-week period, regardless of duration" -- doesn't apply in most newspaper situations. The number fired is either less than that or reached through buyouts, which don't count as layoffs.
Newspaper layoffs also don't appear to be fall under the WARN Act, which requires companies to to give advance notice of mass layoffs under certain conditions. The government's guide to what must be reported (PDF) says you may be covered by the law if your job loss occurs as part of:
  • A plant closing ... where your employer shuts down a facility or operating unit ... within a single site of employment ... and lays off at least 50 full-time workers;
  • A mass layoff .... where your employer lays off either between 50 and 499 full-time workers at a single site of employment and that number is 33% of the number of full-time workers at the single site of employment; or
  • A situation where your employer ... lays off 500 or more full-time workers at a single site of employment.
These don't apply to the typical newspaper scenario.
Many states report these layoff notices on the Web, including Kentucky (PDF). The only mention of a newspaper I found after checking a few other states was in Florida, where in March McClatchy reported 71 of its layoffs.
I found the layoff lists interesting to scroll through. They're a window onto the grinding wheels of the economy and put the news industry's woes, however sorrowful they may be, in context. There are lots of layoffs by mortgage companies, restaurant chains and transportation companies. And if nothing else, you can take cold comfort in knowing that you're in the same predicament as the 162 people let go at Dolly Parton's Dixie Stampede.

Monday, March 3, 2008

Illustrating quantity

image

A bar chart is empty of emotion, with the number 3 delivering the same impact as 300 million. That isn't true of this series of images by Chris Jordan called "Running the Numbers: An American Self-Portrait":

This series looks at contemporary American culture through the austere lens of statistics. Each image portrays a specific quantity of something: fifteen million sheets of office paper (five minutes of paper use); 106,000 aluminum cans (thirty seconds of can consumption) and so on. My hope is that images representing these quantities might have a different effect than the raw numbers alone, such as we find daily in articles and books. Statistics can feel abstract and anesthetizing, making it difficult to connect with and make meaning of 3.6 million SUV sales in one year, for example, or 2.3 million Americans in prison, or 410,000 paper cups used every fifteen minutes. This project visually examines these vast and bizarre measures of our society, in large intricately detailed prints assembled from thousands of smaller photographs. The underlying desire is to emphasize the role of the individual in a society that is increasingly enormous, incomprehensible, and overwhelming.

Tuesday, February 26, 2008

Saving the American Time Use Survey

A group is soliciting signatures to save the American Time Use Survey from being cut from the federal budget. They say the survey is "the most important new data initiative begun by the U.S government in at least 35 years":

The ATUS provides essential information on how Americans spend their time, including time spent caring for children, cleaning the house, working for pay, and caring for sick adults. Put simply, the ATUS is needed to expand our horizons beyond merely charting where dollars go, to charting where time goes too. Statistics on spending, jobs, incomes, and so on are undeniably important. But anyone who wants to understand the changing lives of American families, to monitor the well-being of the American population, to measure national output, productivity and other outcomes that are essential to sound economic policy-making, or to make informed social policy decisions also needs information on how our population spends its time.

Friday, January 18, 2008

Official Statistics on the Web

Official Statistics on the Web, or OFFSTATS, from the University of Auckland Library, points you to free statistics from official sources online. Here's the section for the United States and here's Wallis & Futuna. You can search by country, region or topic. The site notes that it points to current data that is often downloadable as text or spreadsheet files.

Thursday, January 17, 2008

STATS: Which is Better at Covering Drug Addiction, HBO’s "The Wire" or The Baltimore Sun?

STATS answers the question:

As “the Wire” brings a fictional version of the Baltimore Sun to life, the real paper recently “exposed” abuse of the new addiction medication, buprenorphine. But as it turns out, HBO’s dramatic series does a far better job of examining the complexities of addiction than what appeared to have the factual power of a real journalistic investigation.

Tuesday, January 15, 2008

Al's Morning Meeting on the Consumer Price Index

Al explains "Why the CPI Is News (And Why It Isn't)":

Some who report the numbers this week will no doubt refer to the CPI as "the cost of living" index. It isn't. The BLS [Bureau of Labor Statistics] says a real cost-of living-index would include things the CPI does not, for instance, taxes not associated with buying things (like income tax and Social Security tax), the cost of crime on your life and so on.

The CPI is not the only gauge of inflation -- not by a long shot. The CPI measures inflation that consumers feel in their day-to-day living expenses. Other indexes ... measure other types of inflation, such as the Producer Price Index, which measures inflation at earlier stages of production, and the Employment Cost Index, which measures inflation in the labor market.

Wednesday, November 21, 2007

Free copy of Edward Tufte's Data Analysis for Politics and Policy

You can download a free PDF copy of Edward R. Tufte's 1974 book, Data Analysis for Politics and Policy, from his Web site.

Tuesday, September 4, 2007

Spreadsheet Addiction

The proprietor of Burns Statistics explains:

Some people will think that the "addiction" in the title is over the top, or at least used metaphorically. It is used literally, and is not an exaggeration. Addiction is the persistent use of a substance where that use is detrimental to the user. It is not the substance that is the problem -- more limited use may be beneficial. It is the extent and circumstances of the use that determine if the behavior is addictive or not.

Spreadsheets are a wonderful invention. They are an excellent tool for what they are good at. The problem is that they are often stretched far beyond their home territory. The overuse of spreadsheets is only too common.

Sunday, July 15, 2007

You too can rank law schools, just like U.S. News & World Report

Two law professors have released detailed data on law schools on their Web site. They say their goal is "to facilitate rigorous, comprehensive, and transparent empirical analysis of law schools and legal education." The data, from the Official Guide to ABA-Approved Law Schools, includes information on law school faculties, curriculum, enrollment, the ethnicity of students, tuition, living expenses, GPA and LSAT scores, attrition, grants, scholarships and student employment after law school. The data has been available on the American Bar Association's Web site, but the professors, Bill Henderson of Indiana University and Andrew Morriss of the University of Illinois, massaged it to make it easier to analyze. The professors recently wrote a column for The American Lawyer defending U.S. News & World Report's law school rankings.The rankings, which attempt to name the nation's best law schools, are despised by many law school faculty and administrators. So much so that one professor created his own rankings, which purport to be better because they place more emphasis on academics, while another developed The Law School Ranking Game, an attempt to prove the rankings are so arbitrary as to be meaningless. The rankings have spawned critical academic papers, including one that discusses the lengths to which some schools may go to boost their rank. Henderson and Morriss, however, argue that law schools have only themselves to blame:

U.S. News is influential among prospective students at least in part because the magazine does what the law schools don't: give law students easy-to-compare information that sheds light on their long-term employment prospects. Law schools could easily supply that information themselves, but they choose not to. In fact, as the collective head shaking about the rankings has increased, the growth of the large law firm sector—which pay salaries that justify the rapidly escalating cost of legal education—has made the rankings more important.

Our research suggests that prospective students care a great deal about their post–law school employment and bar passage prospects—information that law schools could readily compile and supply. We found that rather than work to provide applicants with the kind of information they say they want and need, law schools tend to report information in a manner that undermines the applicants' ability to engage in meaningful comparative assessments on measures that matter. These practices, which range from puffery to borderline deceit, are all aimed at improving their U.S. News rankings. As a result, even as the rankings have become more important, they have become less reliable.

Hans Rosling: "Unveil the beauty of statistics"

Hans Rosling, one of the founders of Gapminder, delivered an impassioned speech to an OECD forum on statistics recently, encouraging governments to make data more freely available. Gapminder's software, Trendalyzer, was recently acquired by Google. It is a superb tool that lets you compare countries over time by various statistical measures. Rosling, who is Swedish, said the hardest thing about building Gapminder was not getting the money, coming up with ideas, or making the technology work -- it was borrowing databases from tax-funded institutions. Such "database hugging" by public institutions hampers innovation, he said. He told how when he visited the U.S., he brought his American Express card to Wall Street, assuming that at such a citadel of capitalism, he would have to pay to walk the sidewalks. Instead, he could stroll unhindered to the door of the New York Stock Exchange. He said governments need to make data freely available so innovators and entrepreneurs can experiment with it. "When sidewalks are free, why can't we make statistics to be the intellectual sidewalks of human societies? … There's no reason to charge for it."

Participants in the forum, which was in Istanbul, did sign a declaration (PDF) that declared "Official statistics are a key 'public good' that foster the progress of societies."

Tuesday, July 10, 2007

Google’s director of research on how to spot bad studies

I'm not going to make any cheap jokes or draw any untoward conclusions (OK, I just did) because Google's director of research, Peter Norvig, cites the "Cartoon Guide to Statistics" as one of seven references in the bibliography for this article about "Warning Signs in Experimental Design and Interpretation." It's a summary of how to spot weak or bogus research.

Thursday, June 28, 2007

O.J. says "duh": New study shows juries often get it wrong

A Northwestern University statistics professor, Bruce Spencer, studied 271 criminal cases and found "that juries gave wrong verdicts in at least one out of eight cases."

To conduct the study, Spencer employed a replication analysis of jury verdicts, comparing decisions of actual jurors with decisions of judges who were hearing the cases they were deciding. In other words, as a jury was deliberating about a particular verdict, its judge filled out a questionnaire giving what he or she believed to be the correct verdict.

“Consider the analogy to sample surveys, where sampling error is estimated even though the true value may never be known,” Spencer said. “The key is replication. To assess the accuracy of jury verdicts, we need a second opinion of what the verdict should be.”

By comparing agreement rates of judges and juries over time and across jurisdictions, and even across types of cases, Spencer’s statistical analysis could give insights into the comparative accuracy of verdicts in different sets of cases.

A draft of his paper, to be published in the Journal of Empirical Legal Studies, can be found here (PDF).

Tuesday, June 5, 2007

Using and abusing statistics on dog bites

 A blog on animal welfare issues takes issue with how newspapers report statistics on pit bull attacks:

I love statistics.  I think they can provide great insights when used correctly.

My problem with statistics is that most people are lazy -- and few ever ask the additional questions like "how did they get that number?" or "is there something else influencing that" or "is there something else to that story?"   When used correctly, statistics can shed a lot of light.  However, when someone looks at statistics to try to prove their opinion, statistics can be very scary.

As the old saying goes, use statistics as a sober person would use a lampost, for ilumination, not as a drunkard who uses it as a crutch.

Tuesday, May 29, 2007

Why Mathematical Models Just Don't Add Up

The authors of "Useless Arithmetic: Why Environmental Scientists Can't Predict the Future," explain why the complex math used to justify many government actions doesn't add up:

Mathematical models are wooden and inflexible compared with the beautifully complex and dynamic nature of the earth. In the 1960s and 1970s — with the arrival of powerful personal computers, governmental requirements for environmental-impact statements, and widespread applications of mathematical models — scientists thought that quantitative models would be the bridge to a better, more secure future in our relationship with the environment. But they have proved to be a bridge too far.

We now know that there are no precise answers to many of the important questions we must ask about the future of human interaction with our planet. We must use more-qualitative ways to answer them.

Predictive quantitative models should be relegated to the dustbin of failed ideas.

The Statistical Modeling, Causal Inference, and Social Science blog comments: "While the article is quite extreme in its derision of quantitative models, plugs the book the authors wrote, and employs easy rhetoric by providing only positive examples of a few failures and not negative examples of many successes, it is right that quantitative models are overrated in our society, especially in domains that involve complex systems. The myriad of unrealistic and often silly assumptions are hidden beneath layers of obtuse mathematics."

Tuesday, March 20, 2007

TrafficSTATS

At TrafficSTATS (STAtistics on Travel Safety) you explore the risk of suffering a brutal, untimely death in a vehicle depending on such factors as your age, your gender, the time of day and the day of the week, the region where you live and your chosen mode of transportation. The traffic data is from the Fatality Analysis Reporting System and the National Household Travel Survey. There's a tutorial and a sample report (PDF). It's a joint project of Carnegie Mellon University and the AAA Foundation for Traffic Safety.

Monday, February 5, 2007

Department of Defense Statistical Information Analysis Division

You can get large quantities of official military statistics on personnel, procurement and casualties courtesy of the Department of Defense's Statistical Information Analysis Division. It won't be easy to analyze, however, because the data is provided as unhelpful PDF files.

Friday, January 26, 2007

Guide to sources of statistics

Looking for statistics on a particular subject? Check out the appendix to the Statistical Abstract of the United States, which provides a summary of sources of statistics. The guide tells you how frequently the statistics are updated, whether they're available on paper or on the Internet and the Web address of the organizations that offer them.

Thursday, November 30, 2006

Learn statistics with online videos

Learner.org offers a free (with registration) online video course on learning statistics called "Against All Odds: Inside Statistics":

With an emphasis on “doing” statistics, this series goes on location to help uncover statistical solutions to the puzzles of everyday life. Learn how data collection and manipulation — paired with intelligent judgement and common sense — can lead to more informed decision-making.

Thursday, October 12, 2006

Debating how to count Iraqi deaths

A just-released Lancet article estimating the number of Iraqi deaths as a result of the U.S. invasion (PDF), an update from a controversial study first published two years ago, has not surprisingly renewed debate about the legitimacy of its methods. The Social Science Statistics Blog gives a rundown of some of the places where it's being discussed.