Depth Reporting

Showing posts with label Databases. Show all posts
Showing posts with label Databases. Show all posts

Saturday, February 23, 2008

Derby DataTrack and Many Eyes: 2008 Kentucky Derby contenders, trainers and sires

This weekend we released the latest version of Derby DataTrack, our database of potential contenders for the Kentucky Derby. I know we (read: I) can do a better job presenting this data, but I haven't yet figured out how. A while ago ManyEyes added a network visualization tool and a way of embedding their visualizations on any Web site, so I thought I'd give it a try:

While this is intriguing, this isn't the solution, so if you have any thoughts on how we can do better that don't involve mastering Flash or Processing in a week, drop me a note.

Monday, October 29, 2007

Listphile

"… is a free website that enables anyone to create collaborative lists, atlases, databases and more. Lists can be broad and ambitious (like a List of All Baseball Players Who Played in the Majors) or niche (Punk Bands from the Lower East Side, 1975-1980), or quirky or ridiculous. You can collaborate with other people to share, create, and make something that will benefit humanity."

An introductory video describes it as a "multimedia database." As of this writing, lists featured on the home page include T206 White Border Baseball Cards, a World Shark Attack Database, a compendium of Yoda quotes from Star Wars with video clips and the greatest divas of all time. There's also a blog for the site.

Thursday, August 30, 2007

LegiStorm Congressional travel database

LegiStorm, whose Congressional staff salary database was mentioned here previously, also offers a database of Congressional travel. Their travel database identifies "which trips took place at a time and location coinciding with major events - like the Superbowl or Mardi Gras - which may have provided additional travel incentive."

Thursday, August 23, 2007

Free + database = Freebase

Freebase, as any Richard Pryor fan knows, is a smokable form of cocaine. It's also "a uniquely structured database that you can easily search, add to and edit":

It's a data commons in the way that a public square is a land commons—available to anyone to use.

Freebase covers millions of topics in hundreds of categories. It's been seeded with a few million topics from open sources, including Wikipedia and Musicbrainz, and while the first topics have mostly been in media categories like movies, music, and television, the Freebase community has already added thousands more topics on subjects from philosophy to European railway stations to the chemical properties of ingredients.

I first learned of it way back in March, when only the Web 2.0 avant garde was allowed in. I also listened to a podcast with one of the co-founders last month, although truthfully I still don't have a very good handle on what it's all about. It's still in "alpha" and "read only" unless you've been granted "write" access. I was given write access a while ago, but haven't yet found the time to give it a try. If you're interested in checking it out yourself, I have free invitations to give away to the first ten people who email me.

PolitiFact and the transformative power of Django

Matt Waite of the St. Petersburg Times is the chief developer of PolitFact, which he says "marks a major shift" in his career:

The site is a simple, old newspaper concept that's been fundamentally redesigned for the web. We've taken the political "truth squad" story, where a reporter takes a campaign commercial or a stump speech, fact checks it and writes a story. We've taken that concept, blown it apart into it's fundamental pieces, and reassembled it into a data-driven website covering the 2008 presidential election.

The whole site is inspired by Adrian Holovaty's manifesto on the fundamental way newspaper websites need to change. Adrian's main theme was that certain kinds of newspaper content have consistent pieces that could be better served to the reader from a database instead of a newspaper story. I built PolitiFact with that in mind.

Essentially the site rates the truthfulness of statements made by the presidential candidates. I especially like that its "Truth-o-meter" rejects mealymouthed phrasing and instead boils statements down to "TRUE," "MOSTLY TRUE," "HALF TRUE," "FALSE" AND "PANTS ON FIRE." More impressive is that Waite, who had lots of help, had never developed a Web site before. He created the site with Django, an open source Web development framework that uses the Python programming language. He says using it "has been a transformative experience."

Beyond being an experiment in journalism or web development, PolitiFact is an experiment in entrepreneurship. We've developed a product that uses reporting labor from the St. Petersburg Times and our sister company Congressional Quarterly to create something that doesn't originate in print. All the talk and all the focus lately in web journalism circles is on local, local, local and to some degree they're right. But there's also something to be said for just putting a good idea on the web that people might find useful. We think we've done that. Now the important part: how are people going to respond? We have no idea. We're anxious to find out.

Friday, July 20, 2007

"D.C. Madam" phone record lookup

If you or someone you love, respect or want to bring down was in Washington D.C. anytime between August 1994 and August 2006, you may find it hard to resist plugging their phone number into dcphonelist.com. This is a searchable database made from the client phone lists released by Deborah Jeane Palfrey, the alleged "D.C. Madam" who maintains she ran a legitimate escort service, not a prostitution ring. CNN is among many news organizations poring over the records, which have already generated a public apology from Louisiana Senator David Vitter. CNN reports:

What have CNN's researchers found so far, apart from five instances of the apologetic Vitter's number? Quite a lot of doctors, actually, and people in the tech industry. Armchair sociologists will make of that what they will. Lots of lawyers, too, but of course in Washington it sometimes feels like everyone is a lawyer. Others run the gamut from the sports world to college professors.

Tuesday, July 10, 2007

Lots of data breaches, little theft

The GAO reports (PDF) that "Data Breaches Are Frequent, but Evidence of Resulting Identity Theft is Limited":

The extent to which data breaches have resulted in identity theft is not well known, largely because of the difficulty of determining the source of the data used to commit identity theft. However, available data and interviews with researchers, law enforcement officials, and industry representatives indicated that most breaches have not resulted in detected incidents of identity theft, particularly the unauthorized creation of new accounts. For example, in reviewing the 24 largest breaches reported in the media from January 2000 through June 2005, GAO found that 3 included evidence of resulting fraud on existing accounts and 1 included evidence of unauthorized creation of new accounts. For 18 of the breaches, no clear evidence had been uncovered linking them to identity theft; and for the remaining 2, there was not sufficient information to make a determination.

Thursday, June 14, 2007

Database names 1.5 million farm subsidy recipients

The Environmental Working Group's Farm Bill 2007 Policy Analysis Database already claims more than 340,000 searches since its release two days ago:

For decades, American taxpayers have provided tens of billions of dollars in federal farm subsidies to some of the largest and wealthiest farm businesses in the nation. But thousands of people who benefited from the subsidy flow were shielded from public view behind layers of partnerships, joint ventures, limited liability corporations, cooperatives, and other business structures that obscured their personal subsidy claims.

Not anymore.

A new Web site, developed by the Environmental Working Group (EWG) from millions of previously unpublished USDA subsidy records and released today, provides nearly full disclosure of federal farm subsidy beneficiaries for the first time. The disclosures include individuals, sometimes numbering in the dozens, whose subsidy benefits pass through one or more plantation-scale farm businesses that produce vast quantities of subsidized cotton, rice and other crops. Many of those businesses receive millions in USDA crop subsidies each year, and according to the new USDA data, pass six-figure benefits through to many people. In many cases, these individuals have not previously had subsidy benefits attributed to them by name.

Here's Kentucky's page and here's Indiana's.

Thursday, May 3, 2007

Make Web applications easily with Zoho Creator

Zoho Creator is a free tool for creating Web applications. You use it to create Web forms that allow people to submit data, process the data, then make it searchable in an online database. You can share your applications with the world, or keep them private and make them available to only a few. You can also embed these applications in your own Web site or blog. Typically, this is the sort of thing you'd do with a programming language like PHP and a database like MySQL. Zoho Creator, however, makes it ridiculously easy to do without specialized skills. Zoho Creator does have its own simplified programming language called Deluge ("Data Enriched Language for the Universal Grid Environment") that you can use to customize your creations, but it's not required. As a demonstration, I've created a simple application that allows people to submit interesting Web sites to Depth Reporting, emails the submissions to me and makes them viewable and searchable on Zoho. If you have something interesting to share, please submit it in this form:

Given that only about 120 to 130 people a day get Depth Reporting's feed, I'm not expecting much, but we'll see.

UPDATE: If you are reading this in an RSS reader, you won't be able to see the form, so you'll have to visit the blog page itself. The code doesn't work in feed readers, which I should have checked before posting.

Tuesday, May 1, 2007

Tracking federal dollars on the Web

LLRX.com writes about the ins and outs of tracking federal dollars using Web databases:

Many of us in a position to be asked do not look forward to the inevitable questions that run along the lines of "did Organization X ever get any federal money?" or "how much do the feds contract out in industry Z?" The problem is that there will seldom be a simple, client-pleasing answer like "oh yes, $52,453,000.75 in fiscal year 2006, according to this single, comprehensive, and authoritative government database." We have to be familiar with a variety of sources, their fundamental strengths and weaknesses, and the ways in which we will have to qualify our answers.

Sunday, April 15, 2007

Dermatology Image Atlas

Ewwwww. You gotta admire the devotion to craft evident in the Dermatolgy Image Atlas, which offers 9,517 pictures like this one. "Tender plaque with central crust and surrounding erythema," indeed.

Wednesday, April 11, 2007

Executive PayWatch Database

The AFL-CIO offers an online database where you can look up the total compensation of top corporate executives and compare it to what you're making.

Thursday, April 5, 2007

White House earmarks database

The White House Office of Management and Budget has put a database of Congressional earmarks online. You can download the data or browse by agency or state, with more features to come. A disclaimer, though, makes me doubt its worth:

This database is not designed, and cannot accurately be used, to identify the individual congressional sponsors of earmarks. In addition, the recipient listed in the database may not in all cases represent the ultimate beneficiary of the earmark. For example, if the Federal Government provides funds to a specific recipient (e.g., a City), that recipient may then provide the funds or benefits to another entity and may not be required to identify the ultimate beneficiary to the Federal Government.

Tuesday, January 2, 2007

Free ZIP code database with latitudes and longitudes

... offered by About.com in Microsoft Access format.

Monday, December 4, 2006

Shipwreck database

The Office of Coast Survey's Automated Wreck and Obstruction Information System (AWOIS) "contains information on over 10,000 submerged wrecks and obstructions in the coastal waters of the United States. Information includes latitude and longitude of each feature along with brief historic and descriptive details." You can download the data as an Access database or Adobe Acrobat PDF file.

Friday, December 1, 2006

Secret CIA flight database

The author of a book about extraordinary rendition called "Ghost Plane: The True Story of The CIA Torture Program," has put a database of the once-secret flights by the spy agency online.

Thursday, November 16, 2006

Database quality

Dr. Dobb's Portal writes about how few organizations check their databases, and why they should.

Tuesday, November 14, 2006

Online Education Database

The Online Education Database "contains reviews of 444 programs from 41 accredited schools":

Unlike other leading online education directories, our database only lists accredited online schools so that you can be sure that these degrees will be respected by potential employers. Our database allows you to sort reviews by programs, school, or degree level. Our library section will educate you on the basics of online universities.

Tuesday, October 31, 2006

Jihad Database

The RAND Voices of Jihad Database "is a compilation of speeches, interviews, statements, and publications of jihadist leaders, foot soldiers, and sympathizers. Nearly all content is in English translation, and has been collected from publicly-accessible websites. Original links are provided, along with excerpts and full-text content when available."