CORE - Semantic Similarity of Open Access publications

The CORE dataset contains information about similarities between scientific papers stored across Open Access repositories. The similarities are calculated using Natural Language Processing techniques based on the full-text. The similarities are provided only for research articles with an accessible and machine readable full-text. More information about the data structure can be found at:

RDF Statistics

At the moment we expose more than 92 million RDF triples describing similarities calculated on a set of more than 400k full-text articles harvested from over 230 Open Access repositories.


The data about the similarities are represented using the MuSIM ontology ( BIBO ontologies ( with links to the OAI (RKBExplorer) repository available in the Linked Data cloud.

Data and Resources

Additional Info

Field Value
Author Petr Knoth
Version 1.0
Last Updated July 30, 2016, 07:28 (UTC)
Created July 13, 2011, 21:44 (UTC)
links:rkb-explorer-oai 200000
shortname CORE
triples 101526714
comments powered by Disqus
comments powered by Disqus