-
SIMILE Data Collection
About Data exposed: various data sets including CIA's World Factbook, Library of Congress' Thesaurus of Graphic Materials, National Cancer Institute's cancer thesaurus, Web... -
Neurocommons text mining pilot
About The complete dataset is composed of a set of smaller datasets. Each download is in one of two formats: (1) WARC or (2) tar.gz. You can read about the WARC format by... -
DMOZ RDF Dump
Data exposed: DMOZ Size of dump and data set: size? Openness: OPEN (?) Use Open Directory License which is, in essence, open (may be some wrinkles about updates). -
Open Directory Project (ODP)
From about page: The Open Directory Project is the largest, most comprehensive human-edited directory of the Web. It is constructed and maintained by a vast, global community... -
British Crime Survey
This dataset has no description