Le Temps dataset for the Swiss Open Cultural Data Hackathon-2016

This repository contains data to be used in the context of the Swiss Open Cultural Data Hackathon of 2016.

This material is available thanks to the [DHLAB] (http://dhlab.epfl.ch/), the newspaper Le Temps and the Swiss National Library.

It is delivered with a CC-BY 2.0 License ([details] (https://creativecommons.org/licenses/by/2.0/legalcode)).

Data consists of OCRed articles from Le Temps newspaper, year 1914:

  1. text of the articles in XML format (text sub-folder). One folder per month, one folder per day, one file per article. One article can contain several article sub-entities, which originally correspond to separated article blocks on the newspaper page.

  2. text annotated with named entities (entities sub-folder). Named entity recognition and disambiguation was performed by querying several web-services: Open Calais, Dandelion and Alchemy. Named entities (and their attributes) are available as in-line annotations within the XML.

To open the archive: tar -jxvf filename.tar.bz2

Data and Resources

Additional Info

Field Value
Author DHLAB
Version 1.0
Last Updated June 8, 2016, 14:20 (UTC)
Created June 8, 2016, 13:40 (UTC)
comments powered by Disqus
comments powered by Disqus