About
This dataset includes a list of citations to scholarly articles from the most recent version of Wikipedia.
License
All files included in this datasets are released under CC0: https://creativecommons.org/publicdomain/zero/1.0/
Projects
Identifiers
- PubMed IDs (
pmid
) and PubMedCentral IDs (pmcid
).
- Digital Object Identifiers (
doi
)
- ...more to come...
Format
Each row in the dataset represents a citation as a (Wikipedia article, scholarly article) pair. Metadata about when the citation was first added is included.
- page_id: The identifier of the Wikipedia article (int), e.g. 1325125
- page_title: The title of the Wikipedia article (utf-8), e.g. Club cell
- rev_id: The Wikipedia revision where the citation was first added (int), e.g. 282470030
- timestamp: The timestamp of the revision where the citation was first added. (ISO 8601 datetime), e.g. 2009-04-08T01:52:20Z
- type: The type of identifier, e.g. pmid
- id: The id of the cited scholarly article (utf-8), e.g 18179694
How to cite this dataset
The canonical citation and most up-to-date version of this dataset can be found at:
Aaron Halfaker, Dario Taraborelli (2015). Wikipedia Scholarly Article Citations. figshare. doi:10.6084/m9.figshare.1299540
Source code
https://github.com/halfak/Extract-scholarly-article-citations-from-Wikipedia (MIT License)
Notes
Citation identifers are extracted as-is from Wikipedia article content. Our spot-checking suggests that 98% of identifiers resolve.