Scholarly article citations in Wikipedia


This dataset includes a list of citations to scholarly articles from the most recent version of Wikipedia.


All files included in this datasets are released under CC0:


  • English Wikipedia


  • PubMed IDs (pmid) and PubMedCentral IDs (pmcid).
  • Digital Object Identifiers (doi)
  • ...more to come...


Each row in the dataset represents a citation as a (Wikipedia article, scholarly article) pair. Metadata about when the citation was first added is included.

  • page_id: The identifier of the Wikipedia article (int), e.g. 1325125
  • page_title: The title of the Wikipedia article (utf-8), e.g. Club cell
  • rev_id: The Wikipedia revision where the citation was first added (int), e.g. 282470030
  • timestamp: The timestamp of the revision where the citation was first added. (ISO 8601 datetime), e.g. 2009-04-08T01:52:20Z
  • type: The type of identifier, e.g. pmid
  • id: The id of the cited scholarly article (utf-8), e.g 18179694

How to cite this dataset

The canonical citation and most up-to-date version of this dataset can be found at:

Aaron Halfaker, Dario Taraborelli (2015). Wikipedia Scholarly Article Citations. figshare. doi:10.6084/m9.figshare.1299540

Source code (MIT License)


Citation identifers are extracted as-is from Wikipedia article content. Our spot-checking suggests that 98% of identifiers resolve.

Data and Resources

Additional Info

Field Value
Author Aaron Halfaker
Maintainer Aaron Halfaker
Last Updated February 9, 2015, 21:54 (UTC)
Created January 30, 2015, 23:17 (UTC)
comments powered by Disqus
comments powered by Disqus