Search for a Dataset - the Datahub

Add Dataset Import Data Package

Open Multilingual Wordnet

Documentation of and links to data for wordnets in 20 languages (Albanian, Arabic, Danish, English, Persian, Finnish, French, Hebrew, Italian, Japanese, Basque, Catalan,...
- HTML
- ZIP
KAIST silver standard corpus

KAIST silver standard corpus Availability: Freely Avalable Usage: Named Entity Recognition Status:Newly created-finished Description: We propose a novel method to...
- HTML
- TXT
PanLex

A lexical database documenting translations among lexemes of language varieties.
xLiD-Lexica

Our xLiD-Lexica dataset in RDF (http://km.aifb.kit.edu/resources/xLiD-lexica.nt) contains about 300 million triples of cross-lingual groundings. It is extracted from Wikipedia...
Syntactic Reference Corpus of Medieval French (SRCMF)

The SRCMF contains the 15 Old French texts with about 280000 words. It has a high-quality manual annotation, based on a linguistically adequate dependency grammar. Annotation...
- HTML
- example/rdf+xml
OLiA Discourse

OLiA Discourse Extensions
- HTML
- meta/owl
linked hypernyms

This Linked Hypernym dataset attaches entity articles in English, German and Dutch Wikipedia with a DBpedia resource or a DBpedia ontology concept as their type. The types are...
- HTML
- application/x-ntriples
ISOcat-metadata

The linguistics community is building a metadata-based infrastructure for the description of its research data and tools. At its core is the ISOcat registry ISOcat.org, a...
- OWL
- HTML
Phonetics Information Base and Lexicon (PHOIBLE)

Phonetics Information Base and Lexicon (PHOIBLE) is a data set of phonological inventories with additional linguistic and non-linguistic information.
Linked Old Germanic Dictionaries

Lexical resources (word lists, etymological dictionaries) for Germanic languages in different historical stages: pre 1100 (incl. Gothic, Old High German, Old English),...
- HTML
- zip:ttl

You can also access this registry using the API (see API Docs).

10 datasets found