Thursday, 4. October 2007

Now Queryable and open linked data: U.S. Census/Congress datasets: 1 billion triples

and its fast!

As you can't blog enough about it, I am copying a story from this announcement email:

(following Text by Josh Tauberer)

Hi, everyone. (This is a revised/combined reannouncement for what was
originally posted on the Linking Open Data list.)

Last November, Chris Bizer wrote, "[T]he DBLP server increases the size
of the Semantic Web by around 10 percent ;-)" [1] Based on the same
logic, I have recently increased the size of the semantic web by 200%!
(in terms of the number of triples; and of course I'm also just joking
here w.r.t. size of the semantic web)

I'm announcing here a new U.S. 2000 Census dataset of 1 billion triples,
accessible over SPARQL and browsable by linked data [2] principles, and
re-announcing my U.S. Congress dataset which is newly browsable with
linked data principles. These two datasets are interconnected, and the
Census dataset is linked up via owl:sameAs to Geonames [3].

I like the Census data set a lot for three reasons--- first, if you live
in the U.S. it has something for you, since it has detailed statistics
on geographic entities down to the level of small towns/villages, and
everyone lives somewhere; second, it meshes up with two other data sets;
and third, it's rich enough on its own to support a wide array of
interesting and real-world useful queries (if, say, you were doing

The OpenLink guys were kind enough to host the data set previously, but
I wanted to push the limits of my own semweb C# library [4] and I wanted
to be able to revise the data set as needed, so I've wanted to host it
myself, which only recently I was able to do (even though I've had the
triples laying around for nearly a year).

A complete description of the data set and how it was constructed and
exposed is here:

Some features of the data set:

Data on 3,200 U.S. counties, 36,000 "towns", 16,000 "villages", 33,000
ZCTAs (something like zip-codes), and 435 congressional districts.

Each of those locations contains around 10 thousand population
statistics, as well as a dc:title, a basic hierarchical structure
between regions, and latitude/longitude.

Very basic geographic/name/lat-lng data (1 million triples) can be
downloaded in N3.

All of the 1 billion triples are accessible via SPARQL. See: which has a few sample
queries. An example query is "List the states in the United States that
have more students in dorms than prisoners."

The URIs for the geographic regions are dereferencable http: URIs. (The
URIs for the predicates in the data set will be updated to be
dereferencable in the future.) For example, you can visit the URI for
New York State:

(Some URIs return very large pages that take Firefox quite a while to
render. That one's OK.)

The dereferencable URIs return 303's to SPARQL DESCRIBE pages describing
those URIs.

There is a sitemap.xml file based on the latest draft circulated [5],
referenced from robots.txt:

And, source code to generate the triples from the Census download files
are posted. It's too large for me to provide the whole RDF myself, for
now at least.

The U.S. Congress data set, which I originally made SPARQL-accessible in
December 2005 but is now revised to follow the new linked data
principles, has 12 million triples containing brief biographical data
for all members of Congress, and mainly data for federal legislation and
voting records going back a number of years. Here are two example
dereferencable URIs:
(= Senator John McCain)
(= a bill in Congress)

Some example Congress-related queries are posted here:
And dump files are here:

An example I like to use is that one could fairly easily create a table
using SPARQL aligning votes on a particular bill by congressmen with,
for instance, the median commuting time to work of their constituents,
as reported by the Census.

Thanks to those who gave feedback on the LOD list --- I haven't been
able to address all of it yet (like how to deal with backlinks on the
dereferenced pages).


- Josh Tauberer
QR barcode by
george22 - 19. Nov, 07:51

Hi. I've been taking the Mirafit religiously since I received it but I don't know whether it's doing anything or not. The only thing I feel is gassy and constipated. Do I need to take it longer to get results? Of course, I just got off some medications that could have been causing the other things too. I will continue tolong trench coats for men / / short trench coats for women / / lightweight leather jacket / / kids leather coats / / boys designer jackets / / kids designer jackets / / womens designer leather jackets / / childrens winter jackets / /

Trackback URL:


semantic weltbild 2.0

Building the Semantic Web is easier together

and then...

foaf explorer

Geo Visitors Map
I am a hard bloggin' scientist. Read the Manifesto.
lebard's photos More of lebard's photos
Skype Me™!



Users Status

You are not logged in.

I support

Wikipedia Affiliate Button


October 2007


route planning
Subscribe Weblog