Tuesday, October 31, 2006

Jihad Database

The RAND Voices of Jihad Database "is a compilation of speeches, interviews, statements, and publications of jihadist leaders, foot soldiers, and sympathizers. Nearly all content is in English translation, and has been collected from publicly-accessible websites. Original links are provided, along with excerpts and full-text content when available."

State and Metropolitan Area Data Book

"The State and Metropolitan Area Data Book features more than 1,500 data items for the United States and individual states, counties and metropolitan areas from a variety of sources." This latest edition of the book includes data on population, housing, cost of living, personal income, new businesses, bankruptcies, agriculture, natural resources, construction, finance, transportation, government employment and more. And you can download Excel files of the data.

Monday, October 30, 2006

Scholars and the Wikipedia

In a well-done article, The Chronicle of Higher Education explores whether academics should embrace or shun Wikipedia.

State-by-state list of voting systems

Electionline.org gives a rundown of the voting systems used in each state, including the manufacturer and links to more information. Given recent history, from the 2000 presidential election to electronic voting machine foulups, this list could prove even important after election day. Another page outlines which states have statewide voter registration databases, a requirement of the 2002 Help America Vote Act.

Are complaints about journalism job cuts and pride in investigative reporting just a bunch of hooey?

Slate's Jack Shafer took journalists' self-regard and moaning about job cuts to task last week:

The idea that a newsroom should employ X hundred staffers because it has traditionally employed X hundred staffers ignores the changes technology has made in the news market. For instance, Tribune critics denounce it for cutting the foreign bureaus at the Baltimore Sun and Newsday, which it owns. But should every metropolitan newspaper keep its Moscow or Jerusalem bureaus when readers can click to Web coverage from the New York Times and the international press, especially when many of those papers are losing circulation? Something's got to give.

Likewise, journalists don't want you to know this, but thanks to technology, it's never been easier to hunt down a story, capture it, and bring it back to the presses for printing. A middle-school student sitting at a Web terminal has more raw reportorial power at his fingertips than the best reporter working at the New York Times had in, say, 1975. The teenager can't command an undersecretary of defense to return his phone call as the Times guy can, but thanks to Google he can harvest news stories and background information that would take the 1975 model journalist days to collect.

The young amateur can also tap hundreds of free databases serving up scientific, legislative, regulatory, and business information in an afternoon that a team of 1975 reporters couldn't assemble in a week. Give him access to JSTOR, PubMed, Edgar, Nexis, Factiva, and other important sites and he'll write three stories in the time the '70s veteran reports one. Naturally, the kid might not have as good an idea of what to do with the information he's collected, but you get my point: Technology has made today's reporter more productive and more accurate than his forebears. So, if the Los Angeles Times peaked at 1,200 reporters and it's down to about 940 now and Tribune wants to cut it further, it's hardly proof that the corporate meanies are defunding the newsroom.

He also wrote a followup:

However appalling newsroom downsizing may be for journalists, it will ultimately reveal what the people who run and own newspapers really think their publications are for. Scratch a serious reporter, and he'll offer volumes about the "public service" his newspaper performs in the form of investigations: It watchdogs government. It keeps corporations honest. It uncovers the dastardly deeds of foreign dictators and prevents genocide. It exposes quacks and charlatans. (It turns the common man into a Socrates if he reads the editorials!)

Newspaper people have enormous egos, if you get my drift, and don't mind massaging the big hairy things in public. Yet the press is hardly the sentry and bulwark of society that reporters imagine it to be. I don't mean to disparage reporters who put their lives on the line to file from Iraq, nor the sleuths who sift through databases to uncover wrongdoing by pharmaceutical companies, or any other enterprising reporter. But too many journalists who wave the investigative banner merely act as the conduit for other people's probing ...

It's interesting to me that his articles didn't generate a peep of comment on the primary discussion groups for investigative reporters: IRE-L or IREPLUS-L.

Government information is there but not there

Google is meeting with federal agencies so they will make their information more easily available to its search engine, GovExec.com reported. The article quotes J.L. Needham, a strategic partner development manager at Google: "As much as 40 percent of the content on agency Web sites is invisible to Google's crawlers, Needham said. This means that for a majority of Internet users who do not know how to look beyond a search engine site, that information is effectively invisible."

Saturday, October 28, 2006

Ms. Dewey

A post on a journalism mailing list called Ms. Dewey the "future of search engines." You'll need sound to appreciate it, and when you get there, try typing "mainstream media" without the quotes. Or don't type anything at all, and see what happens.

Thursday, October 26, 2006

Google's secret YouTube ethnographic research revealed

Newspaper reporting is often like this:

Academic Blog Portal

The Academic Blog Portal is a directory of academic blogs. It's a wiki, so anyone can contribute a listing.

CNET's top 10 Web research tools

CNET gives its take on the Internet's top 10 research tools. Actually they name 11, five of them from Google.

Neighboroo

Neighboroo uses color-coded Google maps and data to describe U.S. neighborhoods. That includes demographics, crime, education, home prices, cost of living, air quality and more. They're also seeking "guroos." It defines a guroo as "someone who is an expert in national or local trends who wants to share their insights with our users.  Neighboroo will soon offer a service for guroos around the nation to share their wonderful and fascinating knowledge."

Wednesday, October 25, 2006

Zotero, the "personal research assistant"

A while ago I wanted to make a bibliography and explored various inexpensive software and Web sites to help me. I didn't want to pay the $240 it costs for Endnote, the standard in this category. Everything I tried was unsatisfactory, if only because each required the tedious typing of each citation. Now there's the free and open source Zotero, a Firefox extension that only works with the just-released version 2.0 of the browser. The sterling part of Zotero is that it grab citations directly from online library catalogs and from Web sites such as Google Scholar and Amazon. Let's say you do a search on Google Scholar for writings on computer-assisted reporting: A small icon will appear on the Firefox address bar with your results. Click on the icon and you have the option of making a bibliographic citation out of one or more of the books or articles you found. Or you can look up a book in Amazon or your local university's online catalog, click on the icon and add a citation to Zotero. Zotero does more than that, though. You can also save copies of Web pages, make notes from selected text, store attachments such as PDFs and images, tag your entries, link related items, save searches and export your bibliographies in multiple formats. This is beta software - the first time I installed it, it didn't work until I deleted my Firefox profile and started fresh. Once I did that, though, it worked as advertised. Zotero is from the Center for History and New Media at George Mason University and its sponsors include the Andrew W. Mellon Foundation, the Alfred E. Sloan Foundation and the Institute of Museum and Library Services, and they recently began recruiting a new developer, so presumably it will only get better.

$7 million for ... nothing?

The federal Veterans Affairs Department spent $7 million to notify veterans of a data breach earlier this year, GovExec.com reports. This, you may recall, was an incident in which nothing ultimately was lost. Is that not evidence that identity theft hysteria has reached absurd proportions? That $7 million was enough to replace the life savings of potential victims many times over. Note, too, that in most highly publicized data breaches, there's no evidence anyone was harmed.

OpenLayers free, interactive Web maps

OpenLayers "makes it easy to put a dynamic map in any web page." The site emphasizes that unlike Google Maps and Microsoft's Virtual Earth, it's an open source project that is free and open to everyone. "As a framework, OpenLayers is intended to separate map tools from map data so that all the tools can operate on all the data sources," the site says. "This separation breaks the proprietary silos that earlier GIS revolutions have taught civilization to avoid. The mapping revolution on the public Web should benefit from the experience of history." I have no idea how well it works, and the site is upfront that its map viewer "is not yet stable."

Business how-to guides

... from Work.com. Topics include hiring, the law, accounting, sales, marketing and more.

Tuesday, October 24, 2006

Family Caregiving 101

Family Caregiving 101 supplies help, advice, ideas and other resources for people who are caring for a loved on who is ill or disabled.

Your own customized Google search engine

Google Co-op, unveiled today, lets you create your own, customized Google search engine, which you can embed in your own Web site or have hosted by Google. You tell Google what sites you want it to search, and it will search only those. To try it out, I created the Depth Reporting Search Engine, which searches journalism-oriented sites. You can let anyone contribute to the making of your search engine, such as suggesting sites, or make it invitation only, as I have. You can also make money from the resulting traffic via Google's AdSense program.

Monday, October 23, 2006

Firefox search extensions

Pandia Search Engine News summarizes "5 Firefox extensions that will change the way you search."

Screen scraping sex offenders

A Wired reporter used Perl and MySQL to identify 755 sex offenders with MySpace profiles, including one who was eventually arrested after allegedly chatting up teenage boys. The reporter, Kevin Poulsen, did it by scraping the Department of Justice's sex offender Web site and matching the names there against MySpace profiles. He's also released his code in the public domain.

Art-O-Meter

The "Art-O-Meter is a device that measures the quality of an art piece. It bases its evaluation on the amount of time that people spend in front of an artwork compared to the total time of exhibition." That's quite a leap -- to equate the time spent in front of art with its quality, but an interesting tool nonetheless. Via information aesthetics

Medical podcasts from Johns Hopkins

Johns Hopkins broadcasts a free, weekly health and medicine podcast, 5 to seven minutes long, advertised as "a lively discussion of the week’s medical news and how it may affect you."

Herblock's Gift

... is an online exhibition of political cartoons from the late, great cartoonist.

Friday, October 20, 2006

The future of news filters

Cyberjournalist.net quotes Tech Crunch editor Michael Arrington saying at a conference that "People are more and more realizing that editors aren't necessarily the best ones to be filtering what's the most important news":

He predicts there will be a fundamental split between the news gatherers and the news filterers, and that people will turn to one type of site, like Digg, to filter the news, and just use to places like The New York Times and Reuters for the actual information. As a result, he thinks the jobs of journalists will fundamentally change to focus primarily on the news gathering.

Urban Conservation Glossary

The Urban Conservation Glossary calls itself "...an easy reference for anyone involved in or simply interested in the built environment..."  It looks quite complete, but not knowing for sure what is meant by "the built environment," I looked for a definition and couldn't find it.

Thursday, October 19, 2006

Finding ancient ruins in France with Google Earth

The Raleigh News & Observer reports that a University of North Carolina professor, Scott Madry, is using Google Earth to spot archaeological ruins in France:

... Madry got out his laptop, fired up Google Earth and looked over lands in Burgundy near his research area. Google Earth displays that area in particularly good resolution. Immediately he spotted features that, to his trained eye, resembled outlines of Iron Age, Bronze Age, ancient Roman and medieval residences, forts, roads and monuments.

"I've spent 25 years in this region of France," Madry said. "In the whole time, I've found a handful of archaeological sites. I found more in the first five, six, seven hours than I've found in years of traditional field surveys and aerial archaeology."

Media ownership by ZIP code

The Center for Public Integrity has relaunched its media tracker database, which will tell you, by ZIP code, who owns the local media, including television stations, radio stations, cable franchises, broadband providers and newspapers. Here's the report for the CJ's ZIP code.

Report card on state doctor discipline Web sites

Public Citizen has released its 2006 ranking of state medical and osteopathic board web sites. It gives a score and details on what information is made public. The Kentucky Board of Medical Licensure ranked 24 out of 65 sites, while the Medical Licensing Board of Indiana ranked 58. Public Citizen also issues recommendations for each site.

Lawsuit over FBI's "Investigative Data Warehouse"

The Electronic Frontier Foundation is suing to learn more about an FBI database it says contains "hundreds of millions of entries of personal information":

According to the FBI, the IDW was developed to collect a wide swath of personal information -- like "photographs, biographical information, physical location information, and financial data" -- for use in anti-terrorism investigations. The FBI said earlier this year that there were over 560 million items in the IDW, and that nearly 12,000 law enforcement agents had access to the information. EFF filed its suit after the FBI failed to respond to two Freedom of Information Act (FOIA) requests for records disclosing the criteria for inclusion in the database and the current privacy policy protecting this sensitive information, among other critical issues.

The FBI has failed to file a public notice describing the database and the criteria for including personal information, as required by the Privacy Act of 1974.

Those Privacy Act notices are published in the Federal Register, by the way, and are a great way to learn about federal government databases and what's in them. Here, for example, is a notice published yesterday about the Comptroller of the Currency's "Consumer Complaint and Inquiry Information System."

Wednesday, October 18, 2006

Tuesday, October 17, 2006

Judging polls

With the election three weeks away, polls are coming fast and furious. The American Association of Public Opinion Research's Press Room offers a collection of resources on understanding surveys and judging their quality. These include:

Their page also links to resources elsewhere . I've updated a few links here because they had gone stale:

Blogs for following the law on the national level

The Indiana Law Blog calls the following "two of the most important new informational tools lawyers now have ... for keeping totally current with law at the national level":

Monday, October 16, 2006

Sunday, October 15, 2006

Open Street Map

OpenStreetMap "is a free editable map of the whole world" that "allows you to view, edit and use geographical data in a collaborative way from anywhere on Earth" :

OpenStreetMap is a project aimed squarely at creating and providing free geographic data such as street maps to anyone who wants them. The project was started because most maps you think of as free actually have legal or technical restrictions on their use, holding back people from using them in creative, productive or unexpected ways.

IT conversations has a recording of Steve Coast, a "freelance hacker" from the UK and one of the site's honchos, talking about it at the Where 2.0 conference. One point Coast makes is that the project has taken off more in Europe than the U.S. because government map data is so much less freely available there than here. Another is that map companies sometimes seed their data with bad information so they can spot people reusing it without permission.

Thursday, October 12, 2006

Debating how to count Iraqi deaths

A just-released Lancet article estimating the number of Iraqi deaths as a result of the U.S. invasion (PDF), an update from a controversial study first published two years ago, has not surprisingly renewed debate about the legitimacy of its methods. The Social Science Statistics Blog gives a rundown of some of the places where it's being discussed.

Logging instant messaging

In the wake of the Mark Foley and Hewlett-Packard scandals, The Wall Street Journal writes (via Yahoo! Finance) about how "Those IMs Aren't as Private as You Think":

There are several ways users can save IM sessions. Google Inc.'s Google Talk instant-messaging service automatically saves the chat sessions of users that have signed in with Gmail email accounts. Users of Google Talk can disable the setting or choose to go "off the record" during a particular session if they want to avoid having it saved. Other instant-messaging services, such as AOL's AIM, Yahoo Inc.'s Yahoo Messenger, and Microsoft Corp.'s Windows Live Messenger, don't store conversations on their servers automatically. But they do offer various tools for companies and individuals to log conversations. Users can save an IM session by using a built-in save feature or by copying it into another file.

Wednesday, October 11, 2006

Same-day U.S. Supreme Court transcripts

The U.S. Supreme Court has begun posting transcripts of oral arguments on the same day the cases are heard. Older transcripts are also available on the site.

Newgie news consolidator and ranker

Newgie is yet another news aggregator:

Newgie.com delivers the news that matters to you most. Newgie's news database is coninuously (sic) updated with news stories from thousands of the most respected news providers. You then have the ability to sort through these news articles using a variety of Newgie's proprietary organizational tools. And Newgie's IntelliRank (tm) technology will help you easily locate the most relevant and important articles so that no time is wasted.

ResearchBuzz took a look.

Tuesday, October 10, 2006

Does a private purchase negate a public record?

Any reporter who makes use of the public record laws knows that some government officials will fight like the devil to avoid turning over inconvenient documents. Here's an example from Bob Woodward's new book, State of Denial: Former White House Chief of Staff Andrew Card sought to get around the Presidential Records Act by keeping a list of potential replacements for top level jobs in a blue spiral notebook he bought himself, reports GovExec.com. Card told Woodward he did that "so it wouldn't be considered a government document or presidential record that might someday be opened to history," the story says, quoting from the book. But GovExec.com goes on to quote an attorney who calls Card's claim the notebook is exempt from the Presidential Records Act "ridiculous."

FedSpending.org

... from OMB Watch is "a free, searchable database of federal government spending." You can search for grants and contracts in a variety of ways, includng name, city, state, congressional district and ZIP code. " ... we hope you will use the data to hold our elected leaders and government agencies accountable for their actions," the site says. GovExec.com says the site "mostly provides easier access to data that already is available through the government's central contract and grant databases."

Increase your value by tracking your time

So says the Unleash Your Potential blog:

Activity logs help you to analyze somewhat objectively where you spend your time. If you are like me, what you discover will surprise you. Since we too easily forget things like reading junk mail, personal phone calls, internet, daydreaming, etc., the activity log will bring these issues to the surface so you can deal with them.

Rich As Dirt

Rich As Dirt looks to be an interesting experiment in Web collaboration:

RichAsDirt is a website to help small farmers improve their yields and efficiency by taking advantage of the collective knowledge and experience of farmers across the country. RichAsDirt offers a set of simple, web-based tools to help farmers plan, track, analyze, and improve their farm operations. ...

RichAsDirt takes advantage of the diversity in farming strategies and experiences of farmers across the country. It's based in part on the economic principle of the "wisdom of crowds," which says that the average of the strategies chosen by many is often better than a single strategy chosen by one. ...

By using data directly from farmers, combining it, and displaying it in a clear and concise way, RichAsDirt can help farmers harness this wisdom and insight, and visualize how their farming methods compare to those of other farmers in their area, and nationwide.

Since RichAsDirt relies on data submitted by its users, the more people who use it, the better it becomes!

The free site "was created by an engineering student with an interest in agriculture and economics. (and too much time on his hands!)"

Free OCR software

I can't vouch for how good it is because I haven't tried it, but SimpleOCR proclaims itself "the only OCR application that is completely free."

Smart Answers

SearchEngineWatch takes "A Closer Look at Ask's Smart Answers." "Ask.com has truly differentiated itself with the 'Smart Answers' it returns for thousands of queries, offering facts, images and targeted links that respond directly to a searcher's information need," writes Brian Smith.

Monday, October 9, 2006

UC Berkeley videos

The University of California at Berkeley has put more than 200 videos from campus courses, seminars and events online at Google Video . These include "General Human Anatomy," "Chemical Structure and Reactivity" and "Structural Aspects of Biomaterials." This is just a hunch, but I'd bet they'd get more hits if they did a video on campus drinking games.

Electronic Evidence Case Digest

Journalists who must extract electronic records from government agencies share something in common with lawyers who have to extract electronic evidence from their adversaries. The Electronic Evidence Case Digest is a searchable database of legal cases involving electronic evidence. It returns the case name, jurisdiction, date, summary and a digest of the case.

Newspaperindex.com blog

The creator of Newspaperindex.com writes a "Daily blog on newspapers and free speech." His blog is far more worldly than the typical American newspaper blog. I came across it because I came across an entry in which he calls OneNote 2007, which I've been experimenting with, "The perfect reporters tool."

Saturday, October 7, 2006

Congressional Spouse Project

The Sunlight Foundation has launched the "Congressional Spouse Project" to enlist the help of citizen journalists to identify members of Congress who, "by hiring their spouses, in effect use their campaign treasury to supplement their own bank accounts. The practice is legal, disclosed in obscure corners of campaign finance reports, and rarely mentioned by those who cover campaigns." Bill Allison of the Foundation writes:

Rep. Richard Pombo ☼ did it with his wife and his brother. In his 2004 presidential campaign, Sen. Joseph Lieberman ☼ did it with his children. Former Majority Leader Tom DeLay did it with his wife and daughter. All have hired relatives to work on their campaigns, paying them salaries out of special interest contributions. Our system of campaign finance is often called "legalized bribery," in which special interests donate tens of thousands of dollars to a member's campaign committee in the hopes of advancing their own issues.

Unfortunately, I tried it and the site failed with Firefox 2.0 RC2 and Internet Explorer 7 RC1. Apparently it works with other versions of these browsers because the site says 170 representatives have been investigated so far.

Allison says this is the start of an effort to develop tools "that allow online citizen journalists to research and record information in a way that allows third parties--say, readers--to string together that research and see patterns." He recently blogged on this topic in a post called "Do-it-yourself Data."

Thursday, October 5, 2006

Darfur resources

The legal Web site, Jurist, maintains an in-depth collection of information on the crisis in Darfur, Sudan, including news articles, Web sites, commentary and documents.

The Fantastic in Art and Fiction

Fans of freaks, monsters, the grotesque and the macabre will appreciate Cornell University's "The Fantastic in Art and Fiction," where you can view images from its collections online or search for related books.

Metasearch overview

"Metasearch engines - which query multiple major search engine sites simultaneously and present you with the results - have become better than ever," reports Raw Story. The article gives a brief rundown of the major ones.

Google Code Search

Google Code Search helps you find publicly available programming source code. That can't be good news for another such service, Krugle, which also debuted recently.

Indeed salary search

Indeed lets you "Search salaries from over 50 million jobs in the past year."  Could be a handy companion for the government salary databases the CJ began posting online recently.

Tuesday, October 3, 2006

Monday, October 2, 2006

LawMemo: Employment Law

If you've been shafted by your boss you'll want to check out LawMemo, a retired law professor's Web site devoted to employment law. There's a blog, legal primers, help on finding a lawyer and links to current cases before the U.S. Supreme Court.

Statistics: A Guide to the Unknown

The Home for Wayward Statisticians points out that the 3rd edition of one of his favorite undergraduate texts, Statistics: A Guide to the Unknown, is available online for free. "This book is a collection of essays about applications of statistics; all the essays are written for a layman, with minimal mathematical exposition," he writes.