Friday, November 30, 2007

Universal Digital Library

The Universal Digital Library has digitized 1.5 million books and made them available for free on the Web. The site, hosted by Carnegie Mellon University, is high-minded and has lofty ideals, but it's so clunky to use that I gave up after ten minutes of frustration. I had trouble viewing the half dozen or so books I wanted to see using its online viewer. It uses the tiff and DjVu file formats to present the books. Both are bad choices because most people -- including me, it seems -- don't have the appropriate plug-ins installed or installed correctly.

Here's how it describes its vision:

For the first time in history, all the significant literary, artistic, and scientific works of mankind can be digitally preserved and made freely available, in every corner of the world, for our education, study, and appreciation and that of all our future generations.

Up until now, the transmission of our cultural heritage has depended on limited numbers of copies in fragile media. The fires of Alexandria irrevocably severed our access to any of the works of the ancients. In a thousand years, only a few of the paper documents we have today will survive the ravages of deterioration, loss, and outright destruction. With no more than 10 million unique book and document editions before the year 1900, and perhaps 100 million since the beginning of recorded history, the task of preservation is much larger. With new digital technology, though, this task is within the reach of a single concerted effort for the public good, and this effort can be distributed to libraries, museums, and other groups in all countries.

Existing archives of paper have many shortcomings. Many other works still in existence today are rare, and only accessible to a small population of scholars and collectors at specific geographic locations. A single wanton act of destruction can destroy an entire line of heritage. Furthermore, contrary to the popular beliefs, the libraries, museums, and publishers do not routinely maintain broadly comprehensive archives of the considered works of man. No one can afford to do this, unless the archive is digital.

Digital technology can make the works of man permanently accessible to the billions of people all over the world. Andrew Carnegie and other great philanthropists in past centuries have recognized the great potential of public libraries to improve the quality of life and provide opportunity to the citizenry. A universal digital library, widely available through free access on the Internet, will improve the global society in ways beyond measurement. The Internet can house a Universal Library that is free to the people.

You can read more about it on Physorg.com.

Scitopia.org

scitopia.org is a search engine devoted to science and technology. It was created by "15 leading science and technology societies":

Searching for a better way to help researchers quickly get to the quality content they need, these society publishers developed a gateway to the research most cited in scholarly work and patents. Scitopia.org searches the entire electronic libraries of the leading voices in major science and technology disciplines and provides relevant results, without the noise of other Internet search engines. More than three million documents, including peer-reviewed journal content and technical conference papers, spanning 150 years of science and technology can be searched through the site.

But to get the full text of the articles you find you'll have to be a member of one of the societies or pay per view.

Wednesday, November 28, 2007

Google Maps now show terrain


View Larger Map

The Google Lat Long Blog explains:

These maps focus on physical features such as mountains, valleys, and vegetation. They contain labels for even very small mountains and trails and are enhanced with subtle shading that can often give a better sense of elevation changes than a satellite image alone.

For example, we think Terrain maps may just be the best way to experience the grandeur of the Grand Canyon or to plan your hiking trip on the Appalachian Trail. And of course, big mountains look really cool. Better yet, you can mix them with custom maps from our users, such as a map of highest points in the United States or a guide to the Pyrenees mountains.

To see the new style, simply click on the "Terrain" button in the upper-right corner of the map.

Multiple people can also now collaborate on making a Google map and you can import KML, KMZ and GeoRSS files.

xkcd - A webcomic of romance, sarcasm, math, and language

image

A lot of the humor in this comic will be lost on anyone not steeped in computer and programming culture, but it's very good, proving once again that drawing talent has nothing to do with being a good cartoonist. Here's how the creator, Randall Munroe, explains its origins:

I was going through old math/sketching graph paper notebooks and didn't want to lose some of the work in them, so I started scanning pages. I took the more comic-y ones and put them up on a server I was testing out, and got a bunch of readers when BoingBoing linked to me. I started drawing more seriously, gained a lot more readers, started selling t-shirts on the site, and am currently shipping t-shirts and drawing this comic full-time.

Tuesday, November 27, 2007

Chart Chooser

image

Juice Analytics' Chart Chooser helps you figure out what type of chart is appropriate for your data. You can then download a ready-made Excel or Powerpoint template to create it.

17 Ways to Get Free Books

... as told by the Frugal Panda.

Monday, November 26, 2007

Designing Visualizations for Time-Based Data

image

Max Kiesler shares his list of "some of the best time-based visualizations on the web":

Most interaction designers understand the concept of timelines and other time-based data. Blogs, calendars, and to-do lists are all examples of time-based data. However, if you are trying to fit 400 data points into a 1024 x 726 screen you'll quickly see how challenging time-base data can be. Currently, many interaction designers are turning to visualizations to overcome many of the issues associated with this form of data representation.

Inter alia: "A Great Way to Keep Track of Federal Cases"

Tom Mighell writes that he loves how Justia is making federal court documents more accessible than PACER:

Another thing that makes the Justia page so great, in my opinion, is the ability to bookmark your case and receive updates whenever your search terms come up with new hits. When you find a case you want to keep track of, the site lets you use any of 36 bookmarking sites to bookmark the page for future reference. And when you conduct a search, you are able to save that search to an RSS feed, so that when there are new hits on your keyword, you are automatically notified.

Wednesday, November 21, 2007

Free copy of Edward Tufte's Data Analysis for Politics and Policy

You can download a free PDF copy of Edward R. Tufte's 1974 book, Data Analysis for Politics and Policy, from his Web site.

Infodoodads on Realius, The Real Estate Game

Infodoodads writes about Realius - The Real Estate Game:

I check Zillow at least once a month, wondering if my house has gone up or down in value. I also check all the recent sales around my house to see how the market is doing. Now, there is a game for people like me, the real estate obsessed, called Realius. I found out about this new game in the latest issue of Newsweek, which describes Realius as the real estate version of fantasy football. Of course, I asked myself, “How in the world could a real estate game be anything like fantasy football?!?” This question was enough to pique my interest — I had to investigate.

International Data Resource Center

The International Data Resource Center collects data for studying global issues. Some of the data is free for anyone to download. Other data requires you to be a member institution.

As the international community is drawn closer together through the phenomenon of globalization, access to international data has become critical for scholars and researchers around the world. Finding reliable data sources that reflect international dimensions can be difficult. In an effort to meet the growing demands for international data, the Inter-university Consortium for Political and Social Research (ICPSR) has created the International Data Resource Center (IDRC). ...

Data can be accessed through a variety of mechanisms. Scholars and researchers can browse ICPSR holdings using the Subject Terms, Series Data, Geography, or traditional search engines. … There are also a variety of instructional resources available on this site.

It includes data from "pivotal studies" that "transformed the way world politics were studied by using scientific method to study international phenomena."

One intriguing dataset coming soon is the "Correlates of War":

The Correlates of War project was the brainchild of Professor J. David Singer. Singer, a professor in the Department of Political Science at the University of Michigan, set out to identify and study those factors that account for war. Since its inception in 1963, the COW Project has continually conducted systematic and quantitative studies pertaining to warfare. Data pertaining to militarized disputes, interstate and civil wars, national capabilities, and alliances are some of the more popularly used datasets. COW is one of the most often used datasets in international relations and has informed the work of hundreds of scholars across the globe

U.S. Government RSS Library

USA.gov has collected links to federal news and information feeds in the "U.S. Government RSS Library." These are feeds on business and economics, consumer issues, the military, education, the environment, family issues, health, law enforcement, science and more.

Tuesday, November 20, 2007

IncidentNews reports on oil spills

The National Oceanic and Atmospheric Administration's IncidentNews offers news, photos and other information about oil spills where it's Office of Response and Restoration got involved. You can also search an archive of incidents for the last 30 years and view recent incidents on a map. Search results link to documents about each incident. It's not just oceans: A search on the Ohio River, for example, turned up four incidents (albeit one was only a drill).

Monday, November 19, 2007

Bulk Access to Congressional Record, Federal Register and more

Tim O'Reilly reports that Carl Malamud, who is credited with shaming the SEC into making its files freely available on the Web, among other things, is now working to give bulk access to Government Printing Office data, which includes the Congressional Record, the Federal Register, presidential papers, Congressional bills, Congressional hearings and other government documents.

The World Bank, Mapped

The bank explains:

"We’ve mashed up Google Maps with World Bank data to give you a visual entry point to browse our projects, news, statistics and public information center by country."

Friday, November 16, 2007

Federal election statistics, 1920-2006

... are available in PDF format from the Clerk of the U.S. House of Representatives:

Since 1920, the Clerk of the House has collected and published the official vote counts for federal elections from the official sources among the various states and territories. These documents, out of print for many years, have been collected and scanned in a format to make them once again available to researchers and students.

geodata.gov

... is "Your One Stop for Federal, State & Local Geographic Data." This looks to be a great resource. Lots of search options, and if the data is downloadable, it gives you a direct link to the files.

geodata.gov will help you:

    Thursday, November 15, 2007

    Whoa, Nellie! Empirical Tests of College Football's Conventional Wisdom

    I'm not enough of a sports fan to want to spend the $5 it costs to read it, but this academic paper sounds intriguing and the kind of thing -- in theory -- statistics-oriented newspaper reporters could do, if so inclined. It's by an economist at Ohio State who specializes in economic history, economic demography, and biodemography:

    College football fans, coaches, and observers have adopted a set of beliefs about how college football poll voters behave. I document three pieces of conventional wisdom in college football regarding the timing of wins and losses, the value of playing strong opponents, and the value of winning by wide margins. Using a unique data set with 25 years of AP poll results, I test college football's conventional wisdom. In particular, I test (1) whether it is better to lose early or late in the season, (2) whether teams benefit from playing stronger opponents, and (3) whether teams are rewarded for winning by large margins. Contrary to conventional wisdom, I find that (1) it is better to lose later in the season than earlier, (2) AP voters do not pay attention to the strength of a defeated opponent, and (3) the benefit of winning by a large margin is negligible. I conclude by noting how these results inform debates about a potential playoff in college football.

    Wednesday, November 14, 2007

    10 Places to Find Free Images Online and Make Your Content More Linkable

    ... from the Search Engine Journal.

    docstoc: "find and share any document"

    docstoc is for sharing any kind of business document, including legal, business, financial, technology, educational and creative documents.  You can tag them, search them and download them. Documents on the site now include rental and employment agreements, a blogger's handbook, a disaster recovery plan, budget planning spreadsheets, a startup expense worksheet, a search engine optimization cheat sheet and — the only one I'm interested in at the moment — the "Top 100 Chuck Norris Facts."

    Tuesday, November 13, 2007

    Covering Crime and Justice guide updated

    ... with three new chapters -- on prosecutors, guns and domestic violence. The guide is written and edited by Criminal Justice Journalists,  non-profit devoted to improving crime coverage.

    Mindy McAdams: Structure as a key to … everything?

    Mindy McAdams at Teaching Online Journalism writes about how incredibly slow news organizations have been to add structure to the information they collect:

    About 13 years ago, when I worked on the online news product of The Washington Post, we struggled with two time-consuming challenges: classified ads and story categories. They were not the only challenges, of course, but they proved particularly difficult to manage.

    The classifieds already had a highly structured system of categories (houses for sale, houses for rent, apartments, etc.), but there was absolutely no structure within the categories. If you wanted a house with a fireplace, you couldn’t search for that (except by using your eyeballs on the text). Why? Because fireplaces were represented by firepl, fpl, fp, frplc … you get the idea. I spent a lot of hours with the czar of classified technology, but the more he taught me about the legacy system, the more impossible it seemed to translate it properly to digital media.

    I remember at least one lunch conversation (maybe several) where I explained the classified system to my online colleagues. There was amazement all around at the idea that the people who “took” the classified ads over the phone could just type anything they wanted (firepl, fpl, fp, frplc …) instead of choosing a single consistent term from a drop-down menu. How were we going to deal with that unstructured mush — in our digital world? How would people find the house with a fireplace?

    Monday, November 12, 2007

    Biases and Restrictions for Google Search

    The Google Operating System, an unofficial blog offering news and tips about Google, writes about "Biases and Restrictions for Google Search." Google filters and reorders its results in multiple ways, and the blog explains how you can change that by editing the URL.

    ResearchBuzz on STATS Indiana relaunch

    ResearchBuzz reports that STATS Indiana has "relaunched with a new design and new data." And it points out that despite its name, STATS Indiana "provides information on other states besides Indiana":

    So what’s new on the site? More data is available for states, counties, and metro areas. Additional data includes IRS migration and income tax data as well as health data from the Centers for Disease Control and Prevention.

    Friday, November 9, 2007

    Poynter eyetrack study Web site and book

    The Poynter Institute has a Web site and a book devoted to its 2007 eyetrack study documenting differences in how people consume news in print and online. The study tracked the eyeball movements of more than 600 subjects age 18 to 60:

    "We wanted to take a scientific look at how people navigate through news in various story forms — and how these forms differ in broadsheet, tabloid and online. Editors and publishers have asked for specific information about this as they make decisions about where to put their resources and how to tell compelling stories most effectively."

    Here are the key findings:

    • "A larger percentage of story text was read, on average, online than in print"
    • "About 75 percent of print readers were methodical. Online readers were different: half were methodical while the other half were scanners. But whether online readers were methodical or scanners, they read about the same volume of story text."
    • "Alternative story forms - like Q&As, timelines, short sidebars and lists -- helped readers understand."
    • "Large headlines and photos in print were looked at first and got dramatically more attention than smaller ones. But online, readers went for navigation bars and teasers."
    • "Documentary news photos - photos of real people doing things in real time - got more attention than staged or studio photographs. Color photos received more attention than black and white in broadsheet. Mugshots got relatively little attention."

    Note to Poynter: Using images to present your key findings, which makes it impossible to cut and paste the text, shows a lack of understanding of how people actually use the Web. I had to retype the findings to share them here.

    Thursday, November 8, 2007

    GovernmentDocs.org: A FOIA'd document database

    As of this writing this site hasn't even officially launched, but I love the concept:

    The goal of the database is to create a central repository of government documents, promoting greater transparency into the inner-workings of our government.

    Traditionally, government watchdog groups have either posted FOIA documents on their websites as unsearchable PDFs, or statically highlighted several pages within a document to bolster their findings. This has historically limited the public's access to FOIA documents, and minimizes the opportunities for use by researchers, journalists and citizen reviewers for further research and disclosures. Governmentdocs.org changes that:

    • Each and every document goes through an optical character recognition (OCR) process, so that the text of each document is entirely searchable.
    • A powerful search engine provides full-text searches and hit highlighting.
    • Citizen reviewers can add information to each document page and highlight important findings, allowing for more robust and targeted searches.
    • Every page of every document has its own unique URL so that documents can be linked, shared, or posted onto websites.
    • The database is a coalition effort, so all of the organizations’ documents will be housed on governmentdocs.org and searches will work across collections.

    The participating organizations are Citizens for Responsibility and Ethics in Washington, the Electronic Frontier Foundation, the Project on Government Oversight, Public Citizen and the Sunlight Foundation.

    Wednesday, November 7, 2007

    Disposable Web pages

    ... offered by disposableWebPage.com:

    Each disposable webpage has a count down clock. You can set this clock to count down anywhere from 90 days to 0 days from the time the page is created. When the remaining time reaches 00:00:00:00, the page is automatically set for disposal and will exist for 2 more weeks before it gets incinerated.

    Finding Old Web Pages

    ...Search Engine Showdown lists the many places online old Web pages can be found.

    Monday, November 5, 2007

    LibriVox: Free audiobooks

    LibriVox promotes the "acoustical liberation of books in the public domain":

    LibriVox volunteers record chapters of books in the public domain and release the audio files back onto the net. Our goal is to make all public domain books available as free audio books.

    They recently celebrated the release of their 1,000th audio book. Recent releases include Karl Marx's "Wage-Labour and Capital" and Edgar Allan Poe's "Murders in the Rue Morgue."

    Are expunged court records destroyed?

    The BRB Public Records Blog says no:

    With limited exceptions, the general rule is that the government does not destroy records.  In the typical scenario, even if the judge orders a set aside, the consumer’s name can still be found by searching the court indexes and the case can still be viewed as a public record. 

    Corporate Fraud Data Base

    Law.com has put online a corporate fraud database compiled to look back at the 5-year anniversary of President Bush's Corporate Fraud Task Force. It offers information about 124 corporate fraud investigations that resulted in 440 indicted defendants. The presentation leaves a lot to be desired -- 5 giant, 15-field tables grafted onto Web pages -- and it took me a minute to figure out that the links below the introductory blurb are for subsequent data pages -- but hey, at least the information is there for all to see.

    Friday, November 2, 2007

    An internal server error on YourStreet

    Ouch. Say I'm going to start a new Web site and it's going to feature the whiz-bang technology I've developed, and say it's only a few days after its official debut and someone goes to the site and they're greeted with this:

    I imagine that makes for a very bad day. I had visited the site briefly last night and was going to write about it. Instead, I'll quote what TechCrunch has to say:

    What do you get when you combine Google Maps with hyper-local news and comments? You get a map-based news site called YourStreet ... The startup has developed an algorithm that extracts geographical information from stories, such as street names, neighborhoods, and cities. It then geo-codes the articles against a longitude and latitude database so that it can place them on a map. The site will start off with regular Google AdSense ads, but that same algorithm will allow it to place local ads with extremely fine granularity. "The thing that distinguishes us," explains CEO James Nicholson, "is that we can get down to a specific street level on the ads." If he can attract enough local visitors to YourStreet, the local dry cleaner may also want to show up to advertise there.

    Mainstream media, which has nothing to brag about in the reliability game, either, can only hope its new competition always executes like this.

    Update: YourStreet is back online. Judge for yourself its value.

    Cattle call for social networking beat reporters

    New York University journalism professor Jay Rosen, author of the PressThink blog and the impresario behind NewAssignment.net, is soliciting journalists to explore "Beat reporting with a social network. The idea is this:

    Maybe a beat reporter could do a way better job if there was a "live" social network connected to the beat, made up of people who know the territory the beat covers, and want the reporting on that beat to be better.

    Thursday, November 1, 2007

    Dilbert and the psychodynamics of science in the media


    The Language Log used yesterday's Dilbert cartoon as a prompt for commentary on "The psychodynamics of science in the media":

    It's true that things would be better if individual scientists were less willing to over-interpret or mis-interpret in order to make a splash; and if PR people were less eager to encourage and help them; and if individual journalists had the time and the ability to do some critical reading in the primary literature, instead of just decorating press releases with a few quick quotes from experts; and if media executives were not too focused on bean-counting to care one way or another about any of this. But focusing on individual failings ignores the fact that all of the people involved -- scientists and journalists and executives and rent-a-weasels -- are responding to the normal economic and psychological forces within their diverse subcultures, which interact badly in their areas of overlap.

    On the whole, the whole system of science and engineering does a pretty good job of creating knowledge and technology. On the whole, the media do a pretty good job making information available to the public. Put them together, and the whole is noticeably less responsible than the parts.

    This was the latest in a series of posts by Mark Liberman on ways the media can improve its reporting on science:

    (And thanks to the Social Science Statistics Blog, which first brought my attention to this.)