Search for a Dataset - the Datahub

Add Dataset Import Data Package

SSF

Syntactic and semantic framework of Croatian language
- api/sparql
- HTML
- N3
- XML
associations

A collection of associations and mapping to DBpedia entities. Currently consisting of 780000 human associations from the Edinburgh Associative Thesaurus (as RDF) and a verified...
- HTML
- application/x-ntriples
- meta/rdf-schema
- RDF
- meta/void
- example/rdf+xml
- CSV
GeoWordNet

GeoWordNet is a semantic resource built from the full integration of WordNet, GeoNames and the Italian part of MultiWordNet. GeoWordNet Public Dataset contains 3,698,238...
- meta/void
- RDF
- meta/sitemap
- CSV
- example/rdf+xml
- HTML
- WordNet
DBpedia in Spanish

These data correspond to the ontology DBpedia version 2014.
- api/sparql
- example/ntriples
- meta/rdf+schema
- HTML
PDEV-Lemon

PDEV is a dictionary which provides insight into how verbs collocate with nouns and other words using an empirically well-founded apparatus of syntactic and semantic categories....
- HTML
- application/x-ntriples
lexinfo

Ontology of lexical categories
- RDF
KORE 50 NIF NER Corpus

KORE 50[1] (AIDA) is a subset of the larger AIDA corpus, which is based on the dataset of the CoNLL 2003 NER task. The dataset aims to capture hard to disambiguate mentions of...
- text/turtle
- PDF
Linguistic Metadata (LIME) vocabulary

LIME (LInguistic MEtadata) is a vocabulary for expressing linguistic metadata about linguistic resources and linguistically grounded datasets. The metadata vocabulary has been...
- HTML
- RDF
IWN

This is the dataset corresponding to the ItalWordNet as created at the Institute of Computational Linguistic "A. Zampolli" in Pisa. The resource contains single instances such...
- RDF
- tar.gz
EuroSentiment

Gabriela Vulcu, Raul Lario Monje, Mario Munoz, Paul Buitelaar and Carlos A. Iglesias (2014), Linked-Data based Domain-Specific Sentiment Lexicons, In: Proceedings of the 3rd...
- api/sparql
- n-quads
SIMPLE

This dataset contains the conversion of the Italian SIMPLE lexicon in different formats including RDF, TTL and a Lemon version of lexical entries with their pointers to senses.
- RDF
- JSON
- TXT
- text/turtle
Brown Corpus in RDF/NIF

RDF version of the Brown Corpus (W. N. Francis, H. Kucera; Brown University; 1979). 1,014,312 words in 500 documents, taken from newspapers texts on diverse topics, non-fiction...
- text/turtle
- example/turtle
Wikilinks RDF/NIF

The Wikilinks corpus is a coreference resolution corpus of very large scale. It contains over 40 million mentions of over 3 million entities. Mentions are manually labeled links...
- example/turtle
- GZ
- CSV
News-100 NIF NER Corpus

This corpus comprises 100 German news articles from the online news platform news.de. All of the articles were published in the year of 2010 and contain the word Golf. This word...
- text/turtle
- PDF
RSS-500 NIF NER CORPUS

This corpus has been created using a dataset comprising a list of 1,457 RSS feeds as compiled in (Goldhahn et al. 2012). The list includes all major worldwide newspapers and a...
- text/turtle
- PDF
DBpedia Spotlight NIF NER Corpus

Based on P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. DBpedia Spotlight: shedding light on the web of documents. In Proc. of the 7th Int. Conf. on Semantic Systems,...
- text/turtle
- PDF
Reuters-128 NIF NER Corpus

This English corpus is based on the well known Reuters-21578 corpus which contains economic news articles. In particular, we chose 128 articles containing at least one NE....
- text/turtle
- PDF
SweFN-RDF

Swedish FrameNet (SweFN), a lexical-semantic in RDF.
- gzip:ntriples
- api/sparql
- url
SALDOM-RDF

SALDO morphology, a morphological Swedish lexicon in RDF.
- gzip:ntriples
- api/sparql
- url
Wordnet

About From website: WordNet® is a large lexical database of English, developed under the direction of George A. Miller. Nouns, verbs, adjectives and adverbs are grouped into...
TalkBank

About About TalkBank: The goal of TalkBank is to foster fundamental research in the study of human and animal communication. It will construct sample databases within each of...
OLiA Discourse

OLiA Discourse Extensions
- HTML
- meta/owl
ISOcat-metadata

The linguistics community is building a metadata-based infrastructure for the description of its research data and tools. At its core is the ISOcat registry ISOcat.org, a...
- OWL
- HTML
French TimeBank

The French TimeBank consists of a set of 109 journalistic articles from 7 different sub-genres annotated according to the ISO-TimeML standard, adapted for the French language....
- OL
- ISO-TimeML
FrameNet

About From website: The Berkeley FrameNet project is creating an on-line lexical resource for English, based on frame semantics and supported by corpus evidence. The aim is to...

You can also access this registry using the API (see API Docs).

30 datasets found