-
SIMILE Data Collection
About Data exposed: various data sets including CIA's World Factbook, Library of Congress' Thesaurus of Graphic Materials, National Cancer Institute's cancer thesaurus, Web... -
Neurocommons text mining pilot
About The complete dataset is composed of a set of smaller datasets. Each download is in one of two formats: (1) WARC or (2) tar.gz. You can read about the WARC format by... -
DMOZ RDF Dump
Data exposed: DMOZ Size of dump and data set: size? Openness: OPEN (?) Use Open Directory License which is, in essence, open (may be some wrinkles about updates).