Depth Reporting

Monday, May 12, 2008

Free Government Information

... was founded by librarians "to raise public awareness of the importance of government information and create a community with various stakeholders to facilitate an open and critical dialogue":

We believe that it is important to garner support for government information not just within our own community of federal depository libraries but with those organizations and citizens that actually need to know about the activities of our government in order to participate fully in the democratic process. This includes non-profit organizations, government watchdogs, academics and researchers, journalists, the business community, and individual citizens. By creating this nexus, we hope to facilitate collaboration among the various stakeholders and participate in the design of a truly robust system for the digital age where government information is freely accessible, fully functional and usable, and preserved in a distributed system of libraries.

Their blog is here.

NutritionData

I wanted to know the nutrition facts for McDonald's $1 sausage breakfast burrito, which I've been eating too often in recent days, and found what I needed at NutritionData:

Since its launch in 2003, Nutrition Data has grown into one of the most credible and useful sources of nutritional analysis on the Web. In July 2006, Nutrition Data was acquired by CondéNet, a digital publisher under the Condé Nast Publications umbrella dedicated to editorial excellence. Nutrition Data's continuing goal is to provide the most accurate and comprehensive nutrition analysis available, and to make it accessible and understandable to all.

The information in Nutrition Data's database comes from the USDA's National Nutrient Database for Standard Reference and is supplemented by listings provided by restaurants and food manufacturers. The source for each individual food item is listed in the footnotes of that food's analysis page. In addition to food composition data, Nutrition Data also provides a variety of proprietary tools to analyze and interpret that data. These interpretations represent Nutrition Data's opinion and are based on calculations derived from Daily Reference Values (DRVs), Reference Daily Intakes (RDIs), published research, and recommendations of the FDA.

While Nutrition Data cannot guarantee the absolute accuracy of every listing, we make every attempt possible to ensure the quality of our data.

Not coincidentally, I will be cutting back my breakfast burrito consumption.

Saturday, May 10, 2008

Depth Reporting's old look restored

I learned today that Depth Reporting's recent problems were caused by Google's blogroll widget, which I have removed from my page. That allowed me to restore Depth Reporting's old template. I considered elaborating on Google's lousy way of handling problems like these, but don't have the patience right now, and will be moving on.

Thursday, May 8, 2008

Apologies for Depth Reporting being out of commission

Depth Reporting, which is hosted by Google's Blogger, was down for more than a day for reasons that are still unclear to me. Apparently others are having similar problems, but as is typical with Google, they haven't responded to my request for help or offered an explanation. They tell users to report problems in their forums, but actually replying to those reports doesn't appear to interest them. Their status page, more than 8 hours after I submitted my request for help, mentions only "a small number" of users reporting "broken feeds," which doesn't fit my situation. I restored Depth Reporting by reverting to one of Google's classic templates, as suggested by a blogger in their forums, but all of my customizations have been lost. Depth Reporting also seems to be having problems displaying images.

To be continued ...

Tuesday, May 6, 2008

Digital File, a database for investigative reporters and researchers

image

Haven't tried it myself, but Digital File costs about $45 and was recommended by a reporter on NICAR-L. It's for organizing investigations:

It is bases on Excel and runs on every PC and Mac. The database helps keeping track of an investigation. All steps are documented in a way that allows quick access and overview. The database contains contact info about sources, questions to ask, documents and (interview) notes, as well as a time tracker and expenses sheet. ‘After years of muddling in Word, this really is a solution!’

The creator is Luuk Sengers, a freelance investigative reporter and journalism lecturer in The Netherlands.

Sunday, May 4, 2008

Can you believe reporters when they write, "studies have shown"?

Stats.org deconstructs the phrase "studies have shown" in reporting on infants fed breast milk versus formula:

... in the increasingly overly-simplified, context-free world of reporting on health, the phrase “studies have shown” is often a formula for telling the reader what the reporter assumes has actually been shown.

Thursday, May 1, 2008

Deliberations: "A Trial Lawyer's Guide To Social Networking Sites"

If the lawyers are paying attention to what jurors are saying on social networking sites, so should you:

We know there are jurors who blog, and jurors who read blogs, and jurors who comment on blogs. By now you're surely convinced that you need to ask potential jurors if they're writing on line. But do you know how? There are nearly countless ways a juror could show up on the Internet. You need some sense of the landscape to ask about them, or you'll get partial answers or answers you don't understand.

If words like "Tweet" and "wiki" pop up often in your vocabulary, you don't need this post. But in case this stuff is new to you, here is Deliberations' short guide to the world of social networking. These are roughly grouped according to the main feature of the site, but most have overlapping features and functions.

Wednesday, April 30, 2008

"6 Free Apps and Utilities for Working with Video"

... as listed by Web Worker Daily.

SportsDesigner.com: 2007 Sports Designer of the Year

SportsDesigner.com unveiled its "2007 Sports Designer of the Year" this month.

 

 

You should also check out the winners of its best infographics contest.

[via Infographics News]

"50 Awesome Open Source Resources for Online Writers"

... from Job Profiles, "your guide to careers and education."

 

[via DailyWritingTips]

Monday, April 28, 2008

Pivot tables and macros in Zoho spreadsheets

A video explains the new features, which also include the ability to import up to 100,000 rows of data. You can write the macros in Visual Basic, share them online and import Excel spreadsheets with existing macros. You can't do everything you can with Excel, though, as both the macros and the pivot tables offered by Zoho are more limited.

 

 

[via TechCrunch]

Attorney General: Ky State Police violated open records law when it denied CJ sex offender database

The Kentucky Attorney General ruled April 17 that the Kentucky State Police violated the state open records law when it refused to give us an updated version of its sex offender database. Here's a copy of the decision, in Microsoft Word format, which the Attorney General's office just put online. And if you are reading this on the Depth Reporting site itself or have a feed reader that handles embedded objects, below is a Scribd version. Scribd is a much more elegant way to share Word, PDF and other documents than posting links to the originals.

The Attorney General's decision misspelled my name, but hey, you can't have everything. The decision itself was entirely in our favor.

Read this doc on Scribd: Kentucky sex offender database decision

Saturday, April 26, 2008

In Uganda it's still the message, not the medium, that matters

While American journalists debate whether to blog, Twitter or become multimedia warriors, the journalism that really matters is still going on in the world:

In a two-pronged operation, police and operatives from the Chieftaincy of Military Intelligence (CMI), Joint Anti-Terrorism Taskforce (JATT) and the Black Mamba squad raided The Independent again, exactly a month after the first raid.

It is 9.30am on Saturday April 26 and The Independent’s Managing Editor Andrew Mwenda is driving from his home along Golf Course Road in Kololo for the Capital Gang programme on Capital FM radio. As he climbs up Coral Crescent Rise towards Lower Kololo Terrace, two suspicious cars come from in front of him, the front one towards him at breakneck speed. Thinking that perhaps the driver had lost control, he stops and tries to reverse when suddenly three other cars appear from behind, one knocking his rear bumper.

Then a swarm of security operatives surround the car, one young man tries to open the door but it is locked from inside. He pulls out a gun and points it at Mwenda asking him to get out of the car. When Mwenda opens the door, the security operatives pounce on him, forcefully pulling him out of the car, confiscating his phones, watch and car before dumping him into a waiting car and driving off in a heavily defended convoy at break-neck speed.

“There were not witnesses around,” Mwenda narrates his ordeal. “I realised the state wanted me to disappear without a trace. So I opened the car window and shouted at people along the road that I was Andrew Mwenda being kidnapped by CMI. At this point, the security operatives pulled me back and this time handcuffed me so that I do not cause more trouble.

[via TEDBlog]

Thursday, April 24, 2008

Datamob: "Public data put to good use"

Datamob "aims to show, in a very simple way, how public data sources are being used":

Our listings emphasize the connection between data posted by governments and public institutions and the interfaces people are building to explore that data.

It's for anyone who's ever looked at a site like MAPLight.org and wondered, "Where did they get their data?" And for anyone who ever looked at THOMAS and thought, "There's got to be a better way to organize this!"

The creators, Sean Flannagan and Lauren Sperber, say they have two broad goals:

  • Encourage governments and public institutions to make more data available in developer-friendly formats like CSV, XML and RDF. Widely accessible public data enables informed civic engagement, and we believe that providing restriction-free data to developers is the best way to promote the technological innovations that will spread knowledge.
  • Illuminate the process of creating interfaces, mashups and visualizations for public data, and inspire people to create new ones.

And this is how Sperber explains the name:

Well, the folks at Freebase coined the term "data mob" to describe a group of data-lovers working together to perfect a small portion of Freebase's ambitiously all-encompassing database. As for our Datamob, we hope it'll inspire more institutions with vast reserves of information to put their data out there in accessible formats—and bring together more data mobbers to bring that information to life.

[via]

Monday, April 21, 2008

I had never heard a singing editorial until this one

The editorial board at the Cleveland Plain Dealer produced a singing editorial after two of the newspaper's reporters were thrown out of a public meeting last week. I liked it a lot, but not all of the Plain Dealer's readers did. One asked:

Does anyone think its a little bit petty for a newspaper to be engaging in such childish antics. Is the old "corrupt politician" headline not working anymore?

Sunday, April 20, 2008

ReadTheWords.com

There's an endless supply of stuff I think I should read but just can't find the time. In recent years I've found podcasts to be a boon for making dead time -- such as doing the dishes, working in the yard and exercising -- more interesting. Wouldn't it be great if I could take all that unread material and turn it into podcasts? You can with the free ReadTheWords.com. You just upload a file, link to a Web page or submit text in a box, and it will return an mp3 suitable for an iPod or other audio player. You can also embed the recordings you make on a Web site, like this

This was a recording I made of this article on Visualizing Social Networks. As you'll hear, you must have a high tolerance for robotic voices. You can choose from many ("Michael," "Lauren," "Tom," and so on), but none of them sound natural enough for me to make much use of this site. I'd rather listen to LugRadio instead.

That it's available at all, however, is admirable. As the site explains, it began this year as a way "To assist students with learning disabilities with their studies, by means of auditory learning and auditory processing." They said they then expanded it because of demand from "students, young professionals, actors and actresses, research departments, bloggers, ecommerce sites, and others" who "expressed how this technology could help them with their daily lives, and their businesses."

Encyclopaedia Britannica opens up to the Web, sort of

If you have a Web site or blog you can now get free access to Britannica.com, which previously cost $70 per year. TechCrunch says:

Encyclopedia Britannica often is used in case studies as a definitive example of how new technology can disrupt a business. Everything was great for the nearly 250 year old privately held company until the Internet came around and a Category Five hurricaned on their parade.

The program is called Britannica WebShare, and the site says it's "for web publishers, including bloggers, webmasters, and anyone who writes for the Internet."

I submitted my site and was given access. Now I can link to any of Britannica's 120,000 articles, such as this one on the Kentucky Derby, and my readers can read it all too. But they still can't access the rest of the encyclopedia. You can also embed widgets on certain topics.  

TechCrunch says it's the equivalent of being "half pregnant":

Britannica is doing a lot of things right - a relatively small staff of a hundred or so editors manages 4,000 unpaid (I believe) contributors who are recognized experts in their field. But, like the music labels, they still somehow feel as though people should pay to consume their content. And that means search engines can’t index their content. And that means they don’t exist.

Instead of going free and opening up to all, they’re using the new program to simply price discriminate. Give people who may link to the site free access. Everyone else has to pay. So in effect they’re aiming to be half pregnant - they want the benefits of web linking but don’t want to give up the subscription fees from the fools who continue to pay them.

We'll see if TechCrunch is right that to survive they'll eventually be forced to make everything free for everyone.

Tuesday, April 15, 2008

Watchdog.net

... aims to "build a hub for politics on the Internet."

Our plan has three parts:

Data: There's a lot of great information out there about politics – district demographics, votes, lobbying records, campaign finance reports – but unfortunately it's split across a dozen different web sites and often hidden behind confusing interfaces. We're pulling all of that together and letting you explore it in one elegant, unified interface. (Plus, we're sharing all the results so you can come up with new ways to explore it.)

Action: Just giving you information isn't enough. Unless you can do something about it, it's just going to get you down. So we're building a series of first-class tools for getting involved – ways to write and call your representatives, send letters to local media, and figure out who to vote for.

Causes: But politics isn't about people doing things in isolation; it's about coming together around shared causes. That's why we let you start your own causes and campaigns, invite your friends to join them, and let you learn about other causes that could use your help.

The site is just getting started so there's not a lot to see yet (" ... we're building this site right before your eyes. So expect things to break, fix, appear, and disappear before your very eyes"), but it's backed by a grant from the Sunlight Network and its founder is Aaaron Swartz, co-founder of Reddit and creator of theinfo.org, mentioned here previously.

Watchdog.net is soliciting help of all kinds and making its source code and data available to all.

Monday, April 14, 2008

How to confirm if a public figure lives at an address

Earlier this month a reporter asked NICAR-L, a computer-assisted reporting discussion list, for help with a story that hinged on whether a public figure lived at a particular address. The public figure did not own the home, and the reporter wanted to know how he could confirm the public figure lived there. The reporter had already tried phone directories and voter registration cards and he said he wasn't ready to just knock on the door and ask or set up "a surveillance operation."

Here were the suggestions from other reporters on the list:

  • Hire a licensed private investigator to search auto registration records on Autotrack, an electronic public record vendor. Reporters are forbidden to access these records directly because of the Driver's Privacy Protection Act, or DPPA, but private investigators are exempt. (I do not know if the private investigator would be violating the act by obtaining the records for the reporter in this way)
  • Utility records, such as water bills
  • Automobile property tax records
  • Pet license records
  • Send the public figure a registered or certified letter, return receipt requested
  • Talk to the public figure's letter carrier
  • Ancestry.com's record databases
  • Circulation records for the reporter's newspaper (Years ago I tried to access classified ad records at the CJ for similar reasons, and was told no)
  • Ask a cop source to check auto registration records for you (this is ethically dicey for the reporter and the cop)
  • Traffic citations
  • Civil and criminal court records
  • Marriage licenses
  • A resume
  • Personnel records, if a public employee
  • Financial disclosure forms, if an elected or appointed government official subject to financial disclosure
  • Ask his florist
  • Google his name

Sunday, April 13, 2008

A Free Online Course in Science Journalism

... is offered by the World Federation of Science Journalists.

The authors and translators of this course are experienced journalists and trainers from all continents. They cover major practical and conceptual issues in science journalism, for example: how to find and research stories, exposing false claims, how to pitch to an editor, turning crisis reporting to advantage and so forth – topics that are relevant to beginners in journalism as well as more experienced reporters and editors in all regions of the world.

Friday, April 11, 2008

Investigative reporting enters the era of chopped meat

image

Bob Greene was a legendary investigative reporter who died yesterday. A criminal investigator before he became a reporter, Greene oversaw the investigative team at Long Island's Newsday decades ago when it wrote about the heroin trade, shady land sales and political skulduggery. His former colleague Anthony Marro wrote of Greene in 2002:

There had been investigative reporters before, of course, and some short-lived investigative teams as well, including the high-powered group assembled by Life magazine. But much that passed for investigative reporting was leaks from police agencies and prosecutors. While Greene and his team got their share of such leaks, the thing that set them apart from most others was the emphasis on original work. They built their own databases. They developed their own chronologies. They drew their own charts to trace the flow of property and money, and to connect the political and business ties of investors. This is common today, but it was so rare in the late ’60s and early ’70s that other papers interested in setting up investigative teams, including The Boston Globe and The Providence Journal, made pilgrimages to Newsday to see how it was done. And at Newsday itself, Greene took reporters — myself included — who had been keeping notes on the backsides of envelopes and the insides of match book covers and taught them how to gather and organize large amounts of information in ways that enabled them to untangle complicated business deals and tear agencies apart.

Truth is, little of what has been labeled investigative reporting during my 20 years in journalism resembled the painstaking work Greene championed. Most of it was either the newspaper equivalent of the term paper ("give me 10,000 words and a special section on the state of poverty in America") or reporting on the work of government investigators -- uncovering scandals that would have become public eventually, anyway. There's been little groundbreaking work. As Jack Shafer wrote last year:

Newspaper people have enormous egos, if you get my drift, and don't mind massaging the big hairy things in public. Yet the press is hardly the sentry and bulwark of society that reporters imagine it to be. I don't mean to disparage reporters who put their lives on the line to file from Iraq, nor the sleuths who sift through databases to uncover wrongdoing by pharmaceutical companies, or any other enterprising reporter. But too many journalists who wave the investigative banner merely act as the conduit for other people's probing ...

Now it appears even what good work is done is shriveling as newspaper profits shrink. Last year Michael Massing wrote in the Columbia Journalism Review:

... top regional papers like the Des Moines Register, the Louisville Courier-Journal, and the St. Louis Post-Dispatch, which once prided themselves on quality investigative work, have cut back on their digging, and that has reduced the amount and quality of news flowing up to the national level.

It hasn't gotten any better since. Alan Mutter wrote that this week's Pulitzer Prizes ought to come "with an asterisk":

Staff cuts that have hit the industry in the last few years require fewer people to do more work to fill the paper and feed the website, reducing the opportunities to produce ground-breaking investigations, riveting photos, sparkling features and exceptional coverage of big, breaking stories.

The Web brings with it a new kind of accountability -- more of what government and business does is visible like never before -- and there's no shortage of people out there willing to find fault and share it with the world. I personally believe this is all for the good, and that even as the old media decays, it is being replaced by a richer, better world of news and information.

Still, it's an uncomfortable place to be if you work in the belly of the rotting beast, because you don't know where you'll be when you once again emerge into the light.

In the meantime, there won't be many moments like the one faced by the new reporter on Greene's investigative team so many decades ago. As Marro tells it, the reporter was about to order a salisbury steak in a restaurant on the company expense account. Greene, who liked his steaks thick and had "an appetite that rivaled Diamond Jim Brady’s," stopped him and said:

“When you eat with the team, you don’t eat chopped meat.”

Monday, April 7, 2008

Whitepages.com opens phone and address data

Whitepages.com is making "virtually all" of its data -- including the data used to make people, reverse phone and reverse address searches -- available to programmers for free.

A press release says Whitepages.com has data on "nearly 180 million people which equals 80 percent of the U.S. adult population." There are also 25 million work listings.

My first thought was that this could prove useful for anyone doing database-driven investigative reporting because it would make it easier to identify people named in public record databases.

But I thought otherwise after reading the terms of use, which include this:

if you implement the API on a restricted web site, you shall provide the Company with a log-in name and password that will allow the Company to access the web site

And these:

(b) you shall not retain or store any Data for any reason;

(c) you shall not aggregate or otherwise combine Data from individual queries for any reason;

Queries are also limited to 1,500 per day. The site says its data can be used to create "consumer applications, Web sites, and mashups" but it can't be used to "create applications for business end-users." It's understandable why they'd do this: Presumably they want to drive traffic to their site from mashups built with their data, but don't want to give away the store. Nevertheless, it's disappointing.

I still wanted to try it out, though, so I signed up for an API key and did a simple test using PHP and Louisville's mayor as my test subject. The way it works is you feed the search terms via a URL and it returns XML with the results.

The code looked like this:

<?php

$url = "http://api.whitepages.com/find_person/1.0/?firstname=jerry;lastname=abramson;zip=40201;api_key=YOUR_API_KEY_HERE";

$xmlstr = file_get_contents($url);

// PHP's SimpleXML apparently can't handle elements with prefixes like wp: // as used by Whitepages.com, so we remove them from the xml $xmlstr = str_replace('wp:', '', $xmlstr); $xml = new SimpleXMLElement($xmlstr); foreach ($xml->listings->listing as $listing) { echo "Name: ", $listing->people->person->firstname, ' ', $listing->people->person->lastname, "\n"; echo "Business: ", $listing->business->businessname, "\n"; echo "Phone: ", $listing->phonenumbers->phone->fullphone, "\n"; echo "Address: ", $listing->address->fullstreet, "\n"; echo "Latitude: ", $listing->geodata->latitude, "\n"; echo "Longitude: ", $listing->geodata->longitude, "\n"; echo "Last validated: ", $listing->listingmeta->lastvalidated, "\n\n-----------\n\n"; }

?>

And produced this output:

Name: Jerry Abramson
Business: Louisville Science Center
Phone: (502) 560-7141
Address: 727 W Main St
Latitude: 38.257345
Longitude: -85.761902
Last validated: 03/2006

-----------

Name: Jerry Abramson
Business: City of Louisville Metro Government
Phone: (502) 574-5000
Address: 400 S 6th St
Latitude: 38.253456
Longitude: -85.760631
Last validated: 12/2006

-----------

Name: Jerry Abramson
Business:
Phone: (502) 897-6559
Address: 44 Eastover Ct
Latitude: 38.252427
Longitude: -85.677070
Last validated: 12/2007

-----------

Name: Jerry Abramson
Business: City of Lsvl Jfrsn Cnty Plc
Phone:
Address: 768 Barret Ave
Latitude: 38.240838
Longitude: -85.731823
Last validated: 12/2004

-----------

Thus by feeding Whitepages just a name and ZIP code, we get back organizations that may be related to our subject, as well as phone numbers, addresses, latitude and longitude for mapping and a date for when the data was last checked. This example doesn't show it, but this search also turned up the name of the mayor's wife.

Nice. Too bad there are so many restrictions.

[via]

Sunday, March 30, 2008

On vacation

Back the week of April 7. See you then.

Saturday, March 29, 2008

A cure for Web flatulence

It's lovely that so many people have so many wonderful ideas for what newspapers should do to save themselves, but I'm tired of reading about them. Why am I tired? Because the people who write about these almost never offer the information you need to evaluate their true worth. Newspaper print ad revenues plunged farther last year than in the any of the 50+ years since such measurements began. That's why people are being laid off. That's why investigative reporting teams are being shut down. That's why almost no one wants to bid when a newspaper goes on sale. The numbers are bad, really bad.

So if you're going to tell us about your great online project and how it's going to help reverse this trend, you've got to give us some numbers too. You've got to give us the information we need to fairly evaluate it -- as a business proposition. You need to give us something more solid than Web flatulence to decide whether your idea is something the news industry can build profitable businesses around.

We're told we need to build data centers. We're told we need to build narrowly targeted Web sites serving niche markets. We're told we need to go hyperlocal. We're told we need to crowdsource. We're told we need to deploy mobile journalists. We're told we need to nurture citizen journalists. We're told we need to engage readers in conversations. We're told we need to become link aggregators. We're told we need to do podcasts. We're told we need to do video. We're told we need to provide feeds for everything we do. We're told we need to spew text messages, Twitter and build widgets on Facebook. We're told we need to do continuous updates online, 24/7.

Fine. Those are all good ideas. But if you've done it, what were the results? Show us the numbers. Give us a fair and honest evaluation of how you did against your competition, however defined.

These are the kinds of questions I want answered for all online news projects, large and small:

  • How many page views did your project generate? How many unique visitors? How long did they stay on the site? Where did they come from? Are they coming back?
  • How does that compare with other things you've done?
  • Do you have any advertisers for this? Who are they? How much are they paying? Is it generating any other revenue? How much? If it isn't generating any money, why not?
  • How much did it cost to make this? How long did it take? How many people were involved? What didn't you do in the meantime?
  • Did you make a profit? Did you even try to measure whether it's profitable? How do you evaluate whether it's successful?
  • Is it easily repeatable? In other words, is it a strategy that can be adopted by any news organization, at any time, or does it require unique, hard-to-find skills? Can you keep it going if the creator quits?
  • Who else is doing this? How successful are they? Do they do it better than you? How easy is it for competitors to duplicate what you've done?
  • What mistakes did you make? What didn't work and why? What would you do differently next time?

Of course most us won't answer most of these questions publicly, either because our employers won't let us, or because we don't know the answer, or because it's not our department, or because it's embarrassing, or because we just want to do what we do because we can and it's cool and it's fun. I get that.

But that's what I want to know.

Friday, March 28, 2008

PDFTextOnline

image

PDFTextOnline offers "Hassle-Free PDF Text Extraction in Your Browser":

Getting text and other content out of your PDF documents is often a hassle. Adobe Acrobat™ (or your other favorite PDF viewer) can do copy-and-paste, but that's time-consuming and tedious for anything but the smallest jobs. Acrobat™ also has a 'save as text' option, but unless you spring for Acrobat™ Professional, it often generates inaccurate text and simply cannot cope with some languages (especially Chinese, Japanese, and Korean).

Your other options include Adobe's online text conversion tools (which make you wait for an email to get the converted PDF content), or one of the dozens of utilities swarming around the Internet that require you to download, install, and then hope that they won't spray viruses around your computer.

I gave it a try on some PDFs and it was impressively fast and converted the PDFs to text cleanly. But unfortunately, it still isn't helpful enough with the PDFs that truly vex me, like this one from our court system.  Its neat and orderly tables of data look like they would be easy to convert to text and import into a spreadsheet, but in fact doing so is an incredible PITA because those neat and orderly tables collapse into a difficult to parse jumble when converted to text. Usually I resort to begging the courts to give it to me in Excel (which can take days, if they'll agree to do it at all)  or using Perl and regular expressions. The PDFTextOnline text was very clean, and appeared as good if not better than other conversion tools I've tried, but still would require to work to put into Excel or a database for analysis.

[via NICAR-L and Neil Reisner]