10 datasets found

Licenses: Creative Commons Attribution

Filter Results
  • USPTO Patent data

    Linked Data version of the US Patent and Trademark Office (USPTO) data. Number of triples: 212,234,735. Number of resources: 3,215,768 Links to other datasets: DBpedia,...
  • DBpedia abstract corpus

    This corpus contains a conversion of Wikipedia abstracts in six languages (dutch, english, french, german, italian and spanish) into the I used the NLP Interchange Format (NIF)....
  • GWPP Glossary

    The GWPP glossary is a set of scientific terms and their definitions that are used inside the Global Water Pathogen Project online book. This dataset is crowdsourced by a large...
  • KORE 50 NIF NER Corpus

    KORE 50[1] (AIDA) is a subset of the larger AIDA corpus, which is based on the dataset of the CoNLL 2003 NER task. The dataset aims to capture hard to disambiguate mentions of...
  • Wikilinks RDF/NIF

    The Wikilinks corpus is a coreference resolution corpus of very large scale. It contains over 40 million mentions of over 3 million entities. Mentions are manually labeled links...
  • News-100 NIF NER Corpus

    This corpus comprises 100 German news articles from the online news platform news.de. All of the articles were published in the year of 2010 and contain the word Golf. This word...
  • RSS-500 NIF NER CORPUS

    This corpus has been created using a dataset comprising a list of 1,457 RSS feeds as compiled in (Goldhahn et al. 2012). The list includes all major worldwide newspapers and a...
  • DBpedia Spotlight NIF NER Corpus

    Based on P. N. Mendes, M. Jakob, A. García-Silva, and C. Bizer. DBpedia Spotlight: shedding light on the web of documents. In Proc. of the 7th Int. Conf. on Semantic Systems,...
  • Reuters-128 NIF NER Corpus

    This English corpus is based on the well known Reuters-21578 corpus which contains economic news articles. In particular, we chose 128 articles containing at least one NE....
  • World Loanword Database

    The World Loanword Database, edited by Martin Haspelmath and Uri Tadmor, is a scientific publication by the Max Planck Digital Library, Munich (2009). It provides vocabularies...