-
DBpedia abstract corpus
This corpus contains a conversion of Wikipedia abstracts in six languages (dutch, english, french, german, italian and spanish) into the I used the NLP Interchange Format (NIF).... -
Ontos News Portal
The Ontos News Portal extracts facts (objects as e. g. persons or organizations as well as relations between them, e. g. a person is working for an organization or living at a... -
JRC-Names-MLODE
From their web site: JRC-Names is a highly multilingual named entity resource for person and organisation names (called 'entities'). It consists of large lists of names and... -
CopyrightTermBank
Terminology on copyright and related concepts -
KORE 50 NIF NER Corpus
KORE 50[1] (AIDA) is a subset of the larger AIDA corpus, which is based on the dataset of the CoNLL 2003 NER task. The dataset aims to capture hard to disambiguate mentions of... -
LemonWiktionary
Lemon data extracted from Wiktionary -
Brown Corpus in RDF/NIF
RDF version of the Brown Corpus (W. N. Francis, H. Kucera; Brown University; 1979). 1,014,312 words in 500 documents, taken from newspapers texts on diverse topics, non-fiction... -
Multext-East
From the web site: Version 4 of the MULTEXT-East resources, a multilingual dataset for language engineering research and development. This dataset contains, for Bulgarian,... -
SentimentWortschatz
SentimentWortschatz, or SentiWS for short, is a publicly available German-language resource for sentiment analysis, opinion mining etc. It lists positive and negative polarity... -
News-100 NIF NER Corpus
This corpus comprises 100 German news articles from the online news platform news.de. All of the articles were published in the year of 2010 and contain the word Golf. This word... -
RSS-500 NIF NER CORPUS
This corpus has been created using a dataset comprising a list of 1,457 RSS feeds as compiled in (Goldhahn et al. 2012). The list includes all major worldwide newspapers and a... -
DBpedia Spotlight NIF NER Corpus
Based on P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. DBpedia Spotlight: shedding light on the web of documents. In Proc. of the 7th Int. Conf. on Semantic Systems,... -
Reuters-128 NIF NER Corpus
This English corpus is based on the well known Reuters-21578 corpus which contains economic news articles. In particular, we chose 128 articles containing at least one NE.... -
PanLex
A lexical database documenting translations among lexemes of language varieties. -
Chat Game corpus
A corpus resulting from an object arrangement game using a computer-mediated setting. -
MExiCo
MExiCo (short for "Multimodal Experiment Corpora") is a data model for data collections containing multimodal linguistic and interaction annotations. -
FiESTA
FiESTA (short for "Format for extensive spatiotemporal annotations") is a generic format for linguistic and behavioral annotations.