Thursday, January 24, 2008

EveryBlock, Heath Ledger and the Pothole Paradox

EveryBlock launched yesterday.

Only eight months after winning a $1.1 million grant in the Knight News Challenge, the non-profit site has gone live with detailed data for San Francisco, Chicago and New York. It promises to expand to more places in the future.

EveryBlock gathers freely available data -- building permits, crime reports, new building permits, blog posts, restaurant inspections, news articles, Flickr photos, Yelp business reviews, missed connections from Craigslist -- and makes it easy to search and browse by neighborhood. As with everything associated with founder Adrian Holovaty, it is artfully done. It sets a standard all data-driven sites should aspire to.

"Sigh, if only newspaper sites were as well organized as this…" Journalistopia said.

Many, including Al's Morning Meeting, hail it as "the beginning of something big."

I'm not so sure.

For one, there are others plowing similar ground. There's outside.in, YourStreet, and Yahoo!'s Our City (which exists only in India, for the moment at least). Everyone wants to be local these days, including Google, where a search for pizza 40205 will get you a map, address, phone numbers, reviews, a menu and more.

None of those sites, which have their own strengths, offer the rich public record data being mined by EveryBlock. But EveryBlock also leaves me a little … cold.

It is data without context, perspective or meaning. A comment on MetaFilter put it this way:

This is a great idea but it certainly won't replace local news coverage because there's no way to figure out what the politics of anything are.

For example, lots of politics around building permits, liquor licenses, development, etc.: no way to know what any of the granular stuff really means. it's nice to know that a restaurant on my block applied for a liquor license: maybe I can go and stop them because I'm afraid of noise.

but how do I know what the backstory is? how do I know who is who? how do I find anything interesting? how do I know if I'm having an impact?

it's a great tool for an actual local reporter to find info needed for stories-- but it doesn't do the valuable thing that reporters do when they are working well, which is boil down all the boring shit and give you what you need to know when you need to know it.

Here's another MetaFilter comment:

Much of the other data has too much noise. That a local restaurant has just received a scheduled inspection is too low-level. I dont care. I do want to know maybe if a local restaurant has received an extremely bad report. So maybe the data needs to be filtered.

And while Journalistopia found it praiseworthy, it added:

It’s tough to put all of that data into context and provide more historical information such as a community’s history, landmarks and evolving story. For instance, having a highly detailed view of crimes in a neighborhood is really cool, but how does my neighborhood compare to another? How is crime in the neighborhood trending?

How indeed?

Holovaty has described conventional news as a "blob of text." "Newspapers need to stop the story-centric worldview," he wrote in 2006.

Stop? I don't think so. Go beyond, maybe. I'm ostensibly a data person, but browsing raw data, while it can be worthwhile, isn't nearly as compelling to me as the sudden, unexplained death of a 28-year-old movie star.

Holovaty's absolutely right that news organizations -- including my own -- haven't even begun to exploit the potential of structured data on the Web. But there's no evidence in the past, no evidence now, nor will there be any evidence in the future, that the way ahead for the news industry is to feed the world more raw data, however skillfully deployed.

We've got too much of the stuff already. What we need is for it to be boiled down. Distilled. Made interesting.

EveryBlock says that's its goal. It says it exists to answer the question: "What's happening in my neighborhood."

For a long time, that's been a tough question to answer. In dense, bustling cities like Chicago, New York and San Francisco, the number of daily media reports, government proceedings and local Internet conversations is staggering. Every day, a wealth of local information is created -- officials inspect restaurants, journalists cover fires and Web users post photographs -- but who has time to sort through all of that?

Our mission at EveryBlock is to solve that problem. We aim to collect all of the news and civic goings-on that have happened recently in your city, and make it simple for you to keep track of news in particular areas. We're a geographic filter -- a "news feed" for your neighborhood, or, yes, even your block.

Just how compelling will its offerings be to most readers?

Steven Johnson, a writer and one of the principals behind outside.in, called it the Pothole Paradox.

The Pothole Paradox goes like this:

1. Say you've got a particularly nasty pothole on your street that you've been scraping the undercarriage of your car against for a year. When the town or city finally decides to fix the pothole, that event is genuinely news in your world. And it is news that you'll never get from your local paper, or TV affiliate, or radio station.

Obviously this is a great opportunity for a site like outside.in, where news of pothole repairs might easily trickle up from neighborhood bloggers. But it's not that simple, alas -- there's a flip side to the pothole paradox:

2. News about a pothole repair just five blocks from your street is the least interesting thing you could possibly imagine.

Johnson added:

The other complication here is that the correct scale of hyperlocal news varies depending on the nature of the news itself. Pothole repair may die out beyond a few blocks, but many happenings -- crimes or political rallies or controversial real estate development -- reverberate more widely. Going local sometimes requires that you zoom in all the way to the block level, even all the way to the individual address. But sometimes you need to zoom out too.

EveryBlock promises to keep adding new features, so we don't know what it will eventually become. But I don't see it appealing to the masses the way it is now. Knowing that there was a construction violation ("34627269N") issued for 35 East 32 Street on December 27, 2007 isn't likely to be interesting even to the people living next door at 37 East 32 Street.

FAILURE TO POST DOT PERMIT FOR PLACING MATERIAL ON STREET.AT TIME OF INSPECTION SKIDS OF CMU "CONCRETE MASONRY UNITS" ARE STORED AT ROAD INFRONT OF 33 E 32 STREET.THE GC HAS STORED THIS MATERIAL ON THE STREET

Gotcha. But just between you and me, did Heath Ledger live nearby?

Wednesday, January 23, 2008

Newsroom101.com

Newsroom101.com offers "exercises in grammar, usage and Associated Press style":

These free, self-instructional exercises are based on issues of grammar, usage and AP style that arose at a daily newspaper and in a course in journalism. They are offered here for journalists, professional writers, college students, high school students, and others who are learning or reviewing journalistic language.

On my first visit to the site I clicked on the first exercise and was greeted with a confusing pop-up that asked for an ID. ID? You mean I have to pay?

I then went back to the home page and it took me a while to find -- after scrolling down past the Google ads -- the introduction that explains how the site works.

It is indeed free, but I was too annoyed to go on. My grammer and english don't need no work, anyways.

Politweets

Politweets mines Twitter for tweets on the presidential race in real time.

U.S. Congress Money Race Widget

MAPLight.org now offers widgets that summarize fundraising for more than 1,500 congressional candidates across the U.S. You can easily embed the widgets in Web sites and blogs, picking and choosing the candidates to show.

Here's a widget showing Kentucky candidates:

There are also presidential widgets the non-profit released last summer.

Monday, January 21, 2008

Hidden data in digital photos

Out-Of-The-Box Lawyering notes that "there’s a lot of hidden information in digital photos":

You’ve probably learned about all the metadata that can be found in word processing files. The metadata may show when a document was created, what editing changes were made, and all sorts of other potentially valuable information.

I recently learned that there is also some extremely valuable information hidden away in the digital version of digital photographs. And Microsoft has a free – that’s free – program that allows you to discover from the digital version such information as the date and time when the photo was taken.

Making money by data mining voters

James Verini of Vanity Fair writes about "Big Brother Inc.":

Knowing your business is big business for Aristotle Inc., whose Orwellian database of voter records has been an essential campaign tool for every president since Ronald Reagan. As the 2008 race heats up, the company’s shadowy founder, John Aristotle Phillips, unveils his most powerful personal-space invader yet.

I remember reading about Phillips during my days as an undergraduate at Boston University studying political science. Phillips won fame back then for drawing up plans to make a nuclear bomb while at Princeton and for nearly winning a race for Congress while in his early 20s.

The article notes that in 2003 he sued Kentucky for access to its voter list. The writer's point of view (or at least the editor's) is obvious from the title, "Big Brother Inc." and the use of words like "Orwellian" and "shadowy," but my thought throughout was: What's so wrong with making it easier for candidates to find like-minded voters? Isn't that the essence of democracy? For me, the article didn't make the case that this is a bad thing.

Friday, January 18, 2008

The Federal Reserve Board Beige Book on current economic conditions

This report is published eight times a year and summarizes "anectdotal information" on current economic conditions from Federal Reserve Banks gathered "through reports from Bank and Branch directors and interviews with key business contacts, economists, market experts, and other sources." There are links for reports going back to 1970. The latest is here. I did a quick word search on the summary page of the latest report and found no mention of the one word everyone wants to know about: recession.

Free neighborhood boundary map files from Zillow (with some strings attached)

You can download them here:

The Zillow data team has created a database of nearly 7,000 neighborhood boundaries in the largest cities in the U.S. And we'd like to share them with you! We're sharing these neighborhoods under a Creative Commons license to allow people to use and contribute to our growing database.

Now comes the fine print: You are free to use the files in this database in applications as long as you attribute Zillow when you use it. You may also make your own changes to the database files and distribute them, as long as you provide them under the same kind of license and give Zillow attribution. The neighborhood shapes are available below, zipped up in the Arc Shapefile format.

Free Geography Tools notes that coverage is still limited, but Zillow is encouraging contributions and will incorporate them in their files if they prove accurate.

Official Statistics on the Web

Official Statistics on the Web, or OFFSTATS, from the University of Auckland Library, points you to free statistics from official sources online. Here's the section for the United States and here's Wallis & Futuna. You can search by country, region or topic. The site notes that it points to current data that is often downloadable as text or spreadsheet files.

D.C. Librarians' Society Legislative Source Book

The Law Librarians' Society of Washington D.C. offers a detailed online Legislative Source Book.

Some of it is for members only, but free content includes:

  1. Internet and Online Sources of U.S. Legislative and Regulatory Information (PDF)
  2. Quick Links to House and Senate Committee Hearings and Other Publications
  3. A Research Guide to the Federal Register and the Code of Federal Regulations
  4. Selected Telephone Numbers and Web Sites With Useful Legislative Information
  5. State Legislatures, State Laws and State Regulations: Web Site Links and Telephone Numbers

Thursday, January 17, 2008

STATS: Which is Better at Covering Drug Addiction, HBO’s "The Wire" or The Baltimore Sun?

STATS answers the question:

As “the Wire” brings a fictional version of the Baltimore Sun to life, the real paper recently “exposed” abuse of the new addiction medication, buprenorphine. But as it turns out, HBO’s dramatic series does a far better job of examining the complexities of addiction than what appeared to have the factual power of a real journalistic investigation.

theinfo.org: for people with large data sets

I choked when I read the intro for this new site, which says it's "for large data sets and the people who love them: the scrapers and crawlers who collect them, the academics and geeks who process them, the designers and artists who visualize them."

Love? If the feelings I have when wrestling with a large data set are love, then my life has been misspent valuing all the wrong things. 

The site says it is a place where these lovers "can exchange tips and tricks, develop and share tools together, and begin to integrate their particular projects":

Some of us have spent years scraping news sites. Others have spent them downloading government data. Others have spent them grabbing catalog records for books. And each time, in each community, we reinvent the same things over and over again: scripts for doing crawls and notifying us when things are wrong, parsers for converting the data to RDF and XML, visualizers for plotting it on graphs and charts.

It's time to start sharing our knowledge and our tools. But more than that, it's time for us to start building a bigger picture together. To write robust crawl harnesses that deal gracefully with errors and notify us when a regexp breaks. To start converting things into common formats and making links between data sets. To build visualizers that will plot numbers on graphs or points on maps, no matter what the source of the input.

We've all been helping to build a Web of data for years now. It's time we acknowledge that and start doing it together.

Actually, the site, if it takes off, could prove very useful to me. But I'll leave the last words to Jennifer Aniston: "The greater your capacity to love, the greater your capacity to feel the pain."

Tuesday, January 15, 2008

Al's Morning Meeting on the Consumer Price Index

Al explains "Why the CPI Is News (And Why It Isn't)":

Some who report the numbers this week will no doubt refer to the CPI as "the cost of living" index. It isn't. The BLS [Bureau of Labor Statistics] says a real cost-of living-index would include things the CPI does not, for instance, taxes not associated with buying things (like income tax and Social Security tax), the cost of crime on your life and so on.

The CPI is not the only gauge of inflation -- not by a long shot. The CPI measures inflation that consumers feel in their day-to-day living expenses. Other indexes ... measure other types of inflation, such as the Producer Price Index, which measures inflation at earlier stages of production, and the Employment Cost Index, which measures inflation in the labor market.

Text Analytics Wiki

... introduces itself:

This wiki aims to be a one-stop site for everything related to Text Analytics (also known as Text Mining or Information Extraction). It aims to go well beyond the limits of Wikipedia to provide links to people, organisations, the latest research and news.

I have it in my head that text analytics could be useful to a computer-oriented reporter such as myself searching for meaning in documents, but first someone will need to implant a microchip in my brain that will allow me to understand the algorithms involved.

Tinfinger famous people search

Phil Bradley's weblog takes a brief look at Tinfinger, a famous people search engine that just debuted in beta, and finds its computer-generated profiles "odd."

Monday, January 14, 2008

Should government workers be deleting their email?

Governing.com says few governments have a system for managing their e-mail, putting their agencies at legal risk:

Millions of state and local employees in jurisdictions all over the country correspond by e-mail every day without giving much thought to what should happen to the product. They may come to regret that behavior. Not only are records, and history, being lost, but many government lawsuits now turn on what is buried in old e-mail messages. Government policy simply has not kept up with the evolving technology. "At the moment," according to Charles Davis, of the National Freedom of Information Coalition, "everyone is looking up and saying, 'Maybe we ought to be keeping this stuff.'" But few have come up with clear rules governing where and how to keep it.

Kentucky's Department for Libraries and Archives does offer guidelines on retaining state government email. These include "Guidelines for Managing E-mail in Kentucky Government" (PDF) and the "Decision Sequence for Determining E-mail Retention." (PDF) And if you wonder how Kentucky's email systems operate,  there's the Commonwealth Office of Technology's explanation of state email systems.

The Indiana Commission on Public Records offers guidelines (PDF) to Indiana agencies.

The Governing.com article describes a few of the ways government officials skirt the public record laws, including "pinning," which "allows two people to send messages back and forth directly to each other's PDAs, without going through the government computer network."

The article says the legal system is beginning to force governments to come to grips with the problem:

... if there was any doubt about the importance of public e-mail management, it should have disappeared in December 2006, with a change in the Federal Rules of Civil Procedure. Under those rules, state and local governments that become litigants in a federal case will have to produce any electronic information considered relevant to the case. If they can't easily retrieve e-mails because they haven't established an efficient way to store them, it's going to cost a lot in staff time. Employees might have to review millions of e-mails to find which ones deal with the plaintiff.

Thursday, January 10, 2008

Using Google's Location Syntax for Election Information

... as explained by WRAL.com:

Google News, at http://news.google.com, looks very much like Google and works very much like Google. You enter a few keywords and get your search results. But Google News is different in that it has a few special search syntax that allow you to really narrow your search. One of the syntax is called location:. In this blog post I'm going to give you some pointers on using location: to get election information.

NetLingo

 .... is "Your daily source of online business, tech & text lingo."  For those of you curious about the meaning of terms like P2U4URAQTP.

Friday, January 4, 2008

Dumpster diving and Huckabee


I'm amused that once Mike Huckabee became the front runner in the Iowa Republican caucuses, he took to using the phrase "dumpster diving" to describe the research his opponents were doing on him. Here's how Huckabee said it last month on Larry King Live:

There's a lot of political dumpster diving that goes on in the campaign. There are people from campaigns going back to my hometown of Hope. They're all over Little Rock. They're looking for any dirt they can find. And usually they'll find it.

Dumpster diving is typically used to describe how the down-and-out and the excessively thrifty sift through others' trash in the hunt for useful and edible things. The former Arkansas governor, however, was invoking its less well-known use as an investigative tool.


Police and private detectives have long seized the trash of the unwitting to gather evidence. Old bank bills, letters, receipts found between the banana peels and snot rags are all manna to the investigator on the hunt for clues. Even better, you don't need a search warrant.

The New York Times reported recently that dumpster diving played a crucial role in exposing Balco's role in the steroid scandal. CBS News has called it a tool for corporate spies. Computer Weekly says it's a way to screen new IT staff. Oracle's CEO once defended using it to gather information on Microsoft. Procter & Gamble has acknowledged its hired hands once picked through the rubbish of Unilever, a hair-care product rival. A magazine for trial lawyers has even recommended it as a strategy in trade disputes.


All draw sanction from a 1988 U.S. Supreme Court case, California v. Greenwood, in which six justices ruled that the police did not violate a defendant's constitutional rights by secretly taking his trash from the curb in front of his house. The police used it to charge him and a friend with drug crimes. The justices opined:

It is common knowledge that plastic garbage bags left along a public street are readily accessible to animals, children, scavengers, snoops, and other members of the public. Moreover, respondents placed their refuse at the curb for the express purpose of conveying it to a third party, the trash collector, who might himself have sorted through it or permitted others, such as the police, to do so. The police cannot reasonably be expected to avert their eyes from evidence of criminal activity that could have been observed by any member of the public.

Two dissenting justices had a different view:
Scrutiny of another's trash is contrary to commonly accepted notions of civilized behavior.

I don't know that any of Huckabee's opponents have actually sorted through his trash. I haven't read of any evidence of that, nor read of any proof offered by Huckabee. Huckabee, however, is certainly playing off the outrage that would ensue if they did. Otherwise, why not call it "in-depth investigation" or "opposition research" or some other more neutral term? Huckabee's implying that there's something unsavory about researching his past, whereas its seems like a perfectly reasonable strategy to me when the presidency of the United States is at stake.


Reporters, naturally, have also yielded to the temptations of the dumpster.


Investigative reporter Jack Anderson's associates once acquired J. Edgard Hoover's trash, so reported Time magazine, and "confirmed that he liked to drink Jack Daniels." Mark Feldstein, a former Anderson intern, wrote in the Washington Monthly that Anderson "rifled through Hoover's trash (including his dog's feces), largely because Anderson thought Hoover had gotten too powerful and needed to be put in his place."


A footnote in California v. Greenwood referred to a 1975 incident in which "a reporter for a weekly tabloid seized five bags of garbage from the sidewalk outside the home of Secretary of State Henry Kissinger." That reporter, Jay Gourley, who once wrote for the Kentucky Post and at the time worked for the National Enquirer, got other journalists clucking. Here's Howard Flieger in the July 28, 1975 issue of U.S. News and World Report:

To go combing through the junk of any household in search of private - and irrelevant - remnants of a family's living habits is just about as far removed from serious investigative pursuits as it is possible to get.


It makes anyone who has devoted a lifetime to journalism, and regards it as a vital and honorable service to public enlightenment, want to get into a hot tub and scrub with a strong soap until it hurts.

My favorite incident, though, involved Portland's Willamette Week. Outraged that the Portland police brought charges against a fellow police officer based on evidence they found in her trash, reporters for the alternative newsweekly went out and grabbed the trash of the district attorney, police chief and mayor. They wrote that they wanted to "make a point about how invasive a 'garbage pull' really is--and to highlight the government's ongoing erosion of people's privacy":

There is something about poking through someone else's garbage that makes you feel dirty, and it's not just the stench and the flies. Scrap by scrap, we are reverse-engineering a grimy portrait of another human being, reconstituting an identity from his discards, probing into stuff that is absolutely, positively none of our damn business.


It's one thing to revel in the hallowed tradition of muckraking. It's another to get down on your hands and knees and nose through wads of someone else's Kleenex. Is this why our parents sent us to college? So we could paw through orange peels and ice-cream tubs and half-eaten loaves of bread?

Maybe so, if it's the difference between winning or losing in New Hampshire.





Thursday, January 3, 2008

Universities with the Best Free Online Courses

... as ranked by the Education Portal. The New York Times recently wrote about how MIT's free online videos have made this 71-year-old physics professor a Web star.

Guide on searching for company information

... from the New York Public Library. There's also a printable version and an online class:

The amount of information available both in print and electronic format has grown exponentially over the last decade. With the widespread popularity of online marketing in this new Internet economy, more and more company information is becoming available. At an unprecedented rate, companies of all flavors and sizes are putting up their own websites and are being included in web-based directories and virtual lists.

Regardless of the abundance of data, there is still no one definitive way for doing business research on a particular company. As ever before, the approach that you take and the resources that you use will depend on the type and amount of information that you have to start with, as well as the type and amount of information that you wish to end up with.