Monday, January 31, 2005

JPROF.com is "a web site for those interested in journalism, especially teachers and students. This site contains many resources that will help those who want to learn about journalism." We won't mention who here needs a remedial course...

LLRX.com offers a wealth of information on and sites for searching the "deep Web" - i.e. the mass of information residing in database-driven sites difficult if not impossible to reach with your typical search engine.

Thursday, January 27, 2005

Data mining is how government, business and academia extract useful information from large databases but it's mostly virgin territory in the computer-assisted reporting world. Here's a Congressional Research Service report that gives an overview of the subject.

Now this is sweet: The Center for Public Integrity has put the financial disclosure statements of state legislators for every state that requires them online. They go back to 2002. You should check these every time a legislator introduces a bill or opines about an issue.

Given that the CJ just joined many other papers in charging for access to its archives, you should read Dan Gillmoor's recent post on why newspapers should keep them open. This blogger points out that it's odd newspapers charge for the fishwrap, the old newspapers almost nobody wants, but give away today's news, the heart and soul of what we do, for free.

The San Francisco Chronicle writes about how blogs have come of age as a news source.

Wednesday, January 26, 2005

The University of Kentucky's PR office is offering a new service, "Hot Topics!", where it "will provide reporters with one or more UK sources on important issues and breaking news stories," according to a press release. "Items will be posted daily, or as needed. Each item will include the source, his/her credentials, and a means of contact." Find them at http://www.uky.edu/PR/For_Journalists/hottopics.html.

Google is experimenting with video search. It searches only a few TV stations so far, and only returns screenshots and a brief excerpt, not the video itself. It hopes to expand over time. It works by searching the closed caption text for the broadcasts. A search on Mitch McConnell turns up appearances or mentions on PBS, CSPAN and FOX.

Reporters on NICAR-L are talking about their favorite online phone directories. A summary of sites mentioned so far:

Tuesday, January 25, 2005

I like this idea from Scott Rosenberg of Salon.com about how newspapers should adopt the same bug tracking software programmers use to track software defects to track reporting errors.

"Let people file 'bug reports' if they believe your publication has published something in need of correcting. The publication can respond however it seems appropriate: If the complaint is frivolous, you point that out; if it's a minor error of spelling or detail, you fix it; if it's a major error, you deal with it however you traditionally deal with major errors -- but you've left a trail that shows what happened. However you respond, you've opened a channel of communication, so that people who feel you've goofed don't just go off to their corners (or their blogs!) feeling that you're unresponsive and irresponsible."

He said the idea was prompted by a A Sacramento Bee column about how more newspapers are using databases to track errors.

Professional crime analysts on a crime mapping discussion list I follow are enthusiastic about a new CBS TV show, Numb3rs, about their kind. They say it hews close to reality.

Monday, January 24, 2005

The New York Times' public editor took his newspaper to task yesterday for the ways it mishandles numbers. His jumping-off point was the NYT's recent story on newspaper circulation, mentioned here earlier, which he said "was largely fair and entirely accurate (if somewhat overstated)." His points about innumeracy apply not just to the NYT, but to reporters everywhere who fail to question numbers given them or to put them in proper context.

"Like a bad cough that spreads its germs indiscriminately, numbers misapplied and ill-explained irritate the sensibilities of the right and the left, the drug company official and the animal rights activist, the art collector and the Jets fan."

It should also give us pause to read about this study by the Pew Internet and American Life Project study that found that "Only 1 in 6 users of Internet search engines can tell the difference between unbiased search results and paid advertisements." I think the same dynamic explains why so many readers (and some reporters) can't distinguish between the editorial and the news pages, or competently weigh the credibility of opposing arguments.

Friday, January 21, 2005

Public Citizen is offering a new online database, at worstpills.org, that it says offers "Reliable, no-nonsense information on more than 600 top selling drugs including 181 we recommend you not use under any circumstances." The database is "searchable by drug name, by family of drugs, by disease or condition, by drug induced disease or condition, and more."

CARblog ... spotting the media is a Belgian blog on computer-assisted reporting. At least the cartoons are in English...

fexIT is a business search engine.

Thursday, January 20, 2005

Nulls are the bane of anyone who works with a database. This article from About.com tells you about them.

Google has an experimental search feature that allows you to "personalize" your searches to the kind you typically do. You select from categories, such as arts/cinema, business/industries, health, sports or computers, and then have the option whenever you search how much you want to personalize it to your interests. You do that by moving a slider from "minimum" to "maximum" personalization. This is available now only through the special Google Personalized Web page.

This has been around almost a year but I learned about it only just now from Mary Ellen Bates' Search Tip of the Month. This month she also offers other Google search tips, including how to search for synonyms (use a ~ before a word to search for similar words) and use shortcuts (such as adding define: before a word to get its definition).

ChoicePoint, the owners of Autotrack, KnowX and other public record search services, is the subject of an A1 story in today's Washington Post. The Post points out that "the little-known information industry giant is transforming itself into a private intelligence service for national security and law enforcement tasks."

Wednesday, January 19, 2005

GovTrack.us ("Knowledge about government is power") strives to make existing information about federal government easier to find and use. It recently won a software developer's award for the way it culls information from blogs. "On this site you'll find the status of legislation, the speeches of representatives on the House and Senate floors, voting records, campaign contribution summaries, and more, plus the opinions of other users through their blog entries," the site says. "And you can follow only the issues that interest you through email updates and RSS feeds."

Police-Scanner.info offers live police scanner audio feeds from across the U.S.

SideStep is "the traveler's search engine."

Tuesday, January 18, 2005

"Regret The Error reports on corrections, retractions, clarifications, and trends regarding accuracy and honesty in North American media."

It seems you can find and even control putatively private surveillance cameras on the Internet with the help of Google. "Video surfers are using this knowledge to peek in on office and restaurant interiors, a Japanese barnyard, women doing laundry, the interior of an Internet collocation facility, and a cage full of rodents, among other things, in locales scattered around the world," The Register reported.

We have four cats and we haven't yet contributed their pictures or bios to Catser, but more than 19,000 others have. Why anyone else would be interested in your cat's nicknames, biography, likes, pet-peeves, favorite toys, favorite nap spot, food, skills, dwelling or how you came to own him or her is beyond me, but they're there.

Monday, January 17, 2005

Who's A Rat calls itself the "largest online database of informants and agents!"

This slick, professional looking site lists two alleged snitches from Indiana and five from Kentucky. It includes their name, aliases, age, city, state, race, occupation, "illegal drug use," "illegal activity committed by this informant," "facts that would question this informant's credibility," and the law enforcement agencies for whom they allegedly inform.

The site says it's for lawyers and individuals who have few resources investigate informants and their arresting officers. You have to register to add names to the database, but amazingly, people do, even leaving their email addresses so they can be contacted. A feature that had allowed pictures to be uploaded has been disabled.

A 31-year-old Boston man charged with marijuana dealing started the site last year after being fingered by an informer. The man told the Boston Herald he has "a deep, deep hate for the system for the way they handle informants. It's sick. They take these big fish to catch minnows."

The site includes a disclaimer that it's for non-violent crimes only and doesn't condone violence or other illegal acts against informants or police officers, but the site looks more than a little ominous given that the NYT just wrote about how more and more gangs are executing witnesses to intimidate people from testifying against them.

Saturday, January 15, 2005

Donga.com used the "Computer Assisted Reporting technique" to show that high-ranking Korean officials rarely served out their full sentences when compared to more run-of-the-mill criminals. I especially love this translation from the Korean original:
After all, the big wigs of absurdity tend to receive a verdict of innocence far higher than the average individual.

Friday, January 14, 2005

Could the Document Archive be a useful way for news organizations to organize documents on large projects? There's also a Firefox extension to make using it easier.

Dan Blake passed on news from the Kentucky Secretary of State Trey Grayson that he has launched a new service allowing you get corporate certificates online that were previously available only via phone. These include such things as the "Certificate of Existence" and the "Certificate of Authorization" and they could prove useful when researching companies because the original documents offer more information than what's been offered online up to now. The SOS also says that all corporate documents filed with the office since September 15, 2004 and after will also be available as scanned images. "Past documents will be made available online as the images are scanned," the SOS says.

Other recently added SOS offerings include:

  • Portions of the Governor's Executive Journal
  • Revolutionary War patents
  • Final copies of legislation

This is all well and good, but my objection is that the SOS is charging $10 a pop for these documents which are available under the records law for copying costs, which would typically be must less than $10 (The way I read the statute that addresses this, and I could be wrong, is they are authorized to charge 50 cents a page for copies of the certificates and $5 to certify them). Of course there are no meaningful copying costs for electronic records, and it bothers me that so many agencies now are now selling public information for profit .

In effect they are imposing a new tax for online records. Undoubtedly they will argue the fees pay for the new computer systems, but does that mean they'll drop the fees once the systems are paid for? I doubt it.

Thursday, January 13, 2005

You may recall that I complained last year when the Kentucky legislature rolled out its online bill tracking service, Bill Watch, because it offered two levels of service: a limited service for the general public, and a much more extensive service for lobbyists and other insiders who were willing to pay $450 or more.

Well, give them credit for doing the right thing. It now appears Bill Watch offers unlimited bill tracking for all users without a fee. You do have to register by choosing a user name and password, but you don't have to otherwise identify yourself unless you want to.

All in all, it looks like an impressive service. You can track bills by subject, keywords, sponsor and more, but I've only given it a cursory look so far. If you've used it or are going to use it, let me know about your experience. A tutorial is here.

Tuesday, January 11, 2005

The New York Times, with the help of its CAR guru, Tom Torok, shows how a change in ABC rules three years ago has masked a much steeper newspaper circulation decline than is generally known. So-called "third-party sales" are now counted as paid circulation even though the copies are bought by advertisers, not readers.

Brant Houston of IRE explains how "Computer-Assisted Reporting Levels the Information Playing Field" for business reporters.

"The PLANTS Database provides standardized information about the vascular plants, mosses, liverworts, hornworts, and lichens of the U.S. and its territories. It includes names, plant symbols, checklists, distributional data, species abstracts, characteristics, images, plant links, references, crop information, and automated tools."

Monday, January 10, 2005

The Government Innovators Network at Harvard offers "Timely examples of government innovation."

A Pennsylvania writer/historian has borrowed from a number of reference works to create the "Online Etymology Dictionary." Whether his borrowing fall under "fair use," I don't know.

An Auburn professor offers "A Glossary of Political Economy Terms."

Friday, January 7, 2005

The White Collar Crime Prof Blog is devoted to commentary on white collar crime, as opposed to white collar crime professors, as the name may imply.

The Legal Database on Child Abduction Cases collects court decisions from around the world on child abduction.

Every week "the Incredible Internet Guy" selects a topic and gives you a list of high-quality Internet sites devoted to it.

Thursday, January 6, 2005

"CensusScope is an easy-to-use tool for investigating U.S. demographic trends, brought to you by the Social Science Data Analysis Network (SSDAN) at the University of Michigan. With eye-catching graphics and exportable trend data, CensusScope is designed for both generalists and specialists." It includes state, metro area and county data going back to 1980 or even earlier for things such as population growth, race, age structure, family structure and income.

Also check out Dataplace, which creates interesting maps and charts of states, counties and cities showing, by Census tract, the homeownership rate, poverty rate, unemployment rate, vacancy rate and the percent of housing units that are overcrowded. It's slick but still "beta," meaning it's still being developed.

You don't have to admit to anyone you need help using computers just like your pre-TV era grandma, but AARP offers 7 "Basic Web Lessons" to get you up to speed on what any 6 year old already knows.

Electronic Discovery Law is a blog for lawyers on "legal issues, news and best practices related to the discovery of electronically stored information."

The Memory Hole has posted lists, in Excel format, of more than 500,000 Air Force historical documents that "cover almost every aspect of US military history from the 1920s to the early 1980s." The list was generated as a result of a FOIA request from researcher Michael Ravnitzky.

Wednesday, January 5, 2005

Blinkx is a new video and audio search engine that uses speech recognition software to search for words actually spoken during broadcasts. It searches CNN, BBC, Bloomberg, MSNBC, CBS.com, HBO, ESPN, Biography, The History Channel, NPR, Voice of America and more.

A search for "Louisville" turned up 10 hits from the BBC and SkySports. But the excerpts included in the results were so garbled -- a result of highly imperfect speech recognition no doubt -- that it was impossible to tell what the videos were about.

An example from a BBC video: "on BBC1 Louisville the union the union when the enemy the Sunnis the World." (That does seem to somehow sum it all up, though, doesn't it?)

You'll recall that Yahoo recently unveiled its own video search engine, which doesn't use speech recognition.

SearchEngineWatch, which reported on the above, offers a list of free and fee-based video search sites. It also mentions SpeechBot, which uses speech recognition like Blinkx.

Tuesday, January 4, 2005

You may remember Atomica/GuruNet/whatever its name was last week. For a while it was free and a lot of people around here installed it and loved it, then they began charging to use it. Now it's free again and you can find it at Answers.com. If you don't remember, the cool thing about what is now called "1-Click Answers" is that from within any program, you just alt-click on any word and it will look it up in a dictionary, thesaurus, encyclopedia and other reference works and display the results. It also comes with a Internet Explorer toolbar, an "Answer Bar" that hides on the lower right-hand corner of the screen or you can go to the Web site itself.