Yesterday I shared a spreadsheet containing references to 497 papers on Ebola including the access and license type of each paper. The access and license types of each paper were crowdsourced. Fourteen volunteers participated in completing the dataset.
One of the intentions of sharing the dataset, apart from sharing a file containing links to 497 scientific articles on Ebola mentioned online, was to crowdsource the access and license type of each paper. I promoted the file and the task amongst my followers on Twitter.
The task was to manually click on each link and personally verify which papers were open access, which were paywalled, which were ‘free to read’, etc., and to verify under which licenses they were published. We also added another column for ‘Publisher’. Contributors were asked to add their names and Twitter usernames on a column next to the Access, License and Publisher rows they had completed.
By Wednesday 13 August 2014, the whole dataset was complete (only a few Publisher rows remained to be completed, which I did). I closed the shared Google spreadsheet for editing and did a little bit of manual data refining; and verified some of the access and licenses types. I then downloaded it and did a bit more refining on Excel; and edited the spreadsheet so it contained a documentation ReadMe sheet and two extra sheets; one sheet with only the Open Access (in this case we included SA, ND and NC Creative Commons Licenses; though as we know fully-fledged Open Access requires CC-BY licenses) and another one with only the CC-BY entries for easier location of the open papers. I shared it last night on figshare, including everyone who helped crowdsource as co-authors of the spreadsheet:
Last night I did a quick chart about the number of papers per type of access. It was late so it may contain errors. One of the reasons why the spreadsheet has been shared openly is so that others can do their own analyses and contrast any information about it.
|Access type||Number of papers in dataset per access type|
|All Open Access (includes NC; 95 CC-BY)||133|
|Free to Read but not OA (All Rights Reserved research papers)||211|
|“Advance Access” (Free to read but not OA)||1|
|News Items (Free to Read but not OA)||6|
|DOIs not found or unresolved||4|
[Please note total is not 497 in the charts above as some license/access types were either not present or unclear; for example there’s cases of papers labeled as “Open Access” but the license for that article was absent of hard to find. In any case this chart needs to be revised and editorial decisions need to be taken about what will count as what. The charts are shared in the knowledge errors can still remain].
Depending on your interests, there is a series of different analyses that could be done from the data. I’ll be working on that; but since we have shared the dataset openly, why not see what you can do with it? (Don’t forget to cite the dataset!)