[On 8 August 2017, this post was selected as Editor’s Choice in Digital Humanities Now at http://digitalhumanitiesnow.org/2017/08/questions-of-access-in-the-digital-humanities-data-from-jdsh/]

[N.B. As usual, typos might still be present when you read this; this blog post is likely to be revised post-publication… thanks for understanding. This blog is a sandbox of sorts].

Para Domenico, siempre en deuda

tl;dr, scroll down to the charts

I used The Altmetric Explorer to locate any  articles from the Journal of Digital Scholarlship in the Humanities that had had any ‘mentions’ online anytime. An original dataset of 82 bibliographic entries was obtained. With the help of Joe McArthur the Open Access Button API was then employed to detect if any of the journal articles in the dataset had open access surrogates (for example, self-archived versions in institutional repositories) and if so, which content they actually provided access to. The API located 24 URLs of the 82 DOIs corresponding to each article in the dataset.

I then edited and refined the original dataset to include only the top 60 results. Each result was manually refined and cross-checked to verify the resulting links matched the correct outputs and to what kind of content they provided access to, as well as to identify the type of license and type of access of each article’s version of record.

A breakdown of the findings below:

Visualisation of numeralia from the JDSH 60 Articles Altmetric-OA Button Dataset

(Note numbers re OA Button results will not add up as there are overlaps and some results belong to categories not listed).

It must be highlighted that only one of the links located via the Open Access Button API provided access to an article’s full version.

This disciplinarily-circumscribed example from a leading journal in the field of the digital humanities provides evidence for further investigations into the effects of publishers’ embargos on the ability of institutional open access repositories to fufill their mission effectively.

The dataset was openly shared on figshare as

Priego, Ernesto (2017): A Dataset Listing the Top 60 Articles Published in the Journal of Digital Scholarship in the Humanities According to the Altmetric Explorer (search from 11 April 2017), Annotated with Corresponding License and Access Type and Results, when Available, from the Open Access Button API (search from 15 May 2017). figshare. https://doi.org/10.6084/m9.figshare.5278177.v3


The Wordy Thing

Back in 2014, we suggested that “altmetrics services like the Altmetric Explorer can be an efficient method to obtain bibliographic datasets and track scholarly outputs being mentioned online in the sources curated by these services” (Priego et al 2014).  That time we used the Explorer to analyse a report obtained by searching for the term ‘digital humanities’ in the titles of outputs mentioned anytime at the time of our query.

It’s been three years since I personally presented that poster at DH2014 in Lausanne, but the topic of publishing pracitices within the digital humanities keeps being of great interest to me. It could be thought of as extreme academic navel-gazing, this business of deciding to look into bibliometric indicators and metadata of scholarly publications. For the digital humanities, however, questions of scholarly communications are questions of methodology, as the technologies and practices required for conducting research and teaching are closely related to the technologies and practices required to make the ‘results’ of teaching and research available. For DH insiders, this is closely connected to the good ol’ less-yacking-more-hacking, or rather, no yacking without hacking. Today, scholarly publishing is all about technological infrastructure, or at least about an ever-growing awareness of the challenges and opportunities of ‘hacking’ the modes of scholarly production.

Moreover, the digital humanities have also been for long preoccupied with the challenges in getting digital scholarship recoginsed and rewarded, and, also importantly, about the difficulties to ensure the human, technical and financial preconditions of sustainability. Scholarly publishing, or more precisely ‘scholarly communications’ as we prefer to say today, are also very much focused on those same concerns. If form and content are unavoidably interlinked and codependent in digital humanities practice, surely issues regarding the so-called ‘dissemination’ of said practice through publications remain vital to its development.

Anyway, I have now finally been able to share a dataset based on a report from the Altmetric Explorer looking into the articles published at the Journal of Digital Scholarship in the Humanities (from now on JDSH), one of the (if not the) leading journal in the field of digital humanities (it was previously titled Literary and Linguistic Computing). I first started looking into which JDSH articles were being tracked by Altmetric as mentioned online for the event organised by Domenico Fiormonte  at the University Roma Tre in April this year (the slides from my participation are here).

My motivation was no only to identify which JDSH outputs (and therefore authors, affiliations, topics, methodologies) were receiving online attention according to Altmetric. I wanted, as we had done previously in 2014, to use an initial report to look into what kind of licensing said articles had, whether they were ‘free to read’, paywalled or labeled with the orange open lock that identifies Open Access outputs.

Back in 2014 we did not have the Open Access Button nor its plugin and API. With it I had the possibility to try to check if any of the articles in my dataset had any openly/freely available versions through the Button. I contacted Joe McArthur from the Button to enquire whether it would be possible to run a list of DOIs through their API in bulk. It was, and we obtained some results.

Here’s a couple of very quick charts visualising some insights from the data.

It should also be highlighted that of the 6 links to institutional repository deposits found via the Open Access Button API, only one gave open access to the full version of the article. The rest were either metatada-only deposits or the full versions were embargoed.

As indicated above, the 60 ‘total articles’ refers to the number of entries in the dataset we are sharing. There are many more articles published in JDSH. The numbers presented represent only the data in question which is in turn the result of particular methods of collection and analysis.

In 2014 we detected that “the 3 most-mentioned outputs in the dataset were available without a paywall”, and we thought that could indicate “the potential of Open Access for greater public impact.” In this dataset, the three articles with the most mentions are also available without a paywall. The most mentioned article is the only one in the set that is licensed with a CC-BY license. The two that follow are ‘free’ articles that require permission for reuse.

The data presented is the result of the specific methods employed to obtain the data. In this sense this data represents as much a testing of the technologies employed as of the actual articles’ licensing and open availability. This means that data in columns L-P reflect the data available through the Open Access Button API at the moment of collection. It is perfectly possible that ‘open surrogates’ of the articles listed are available elsewhere through other methods. Likewise, it is perfectly possible that a different corpus of JDSH articles collected through other methods (for example, of articles without any mentions as tracked by Altmetric) have a different proportion of license and access types etc.

As indicated above the licensing and access type of each article were identified and added manually and individually. Article DOI’s were accessed one by one with a computer browser outside/without access to university library networks, as the intention was to verify if any of the articles were available to the general public without university library network/subscription credentials.

This blog post and the deposit of the data is part of a work in progress and is shared openly to document ongoing work and to encourage further discussion and analyses. It is hoped that quantitative data on the limited level of adoption of Creative Commons licenses and Institutional Repositories within a clearly-circumscribed corpora can motivate reflection and debate.


I am indebted to Joe McArthur for his kind and essential help cross-checking the original dataset with the OA Button API, and to Euan Adie and all the Altmetric team for enabling me to use the Altmetric Explorer to conduct research at no cost.

Previous Work Mentioned

Priego, Ernesto; Havemann, Leo; Atenas, Javiera (2014): Online Attention to Digital Humanities Publications (#DH2014 poster). figshare. https://doi.org/10.6084/m9.figshare.1094345.v1 Retrieved: 18:46, Aug 04, 2017 (GMT).

Priego, Ernesto; Havemann, Leo; Atenas, Javiera (2014): Source Dataset for Online Attention to Digital Humanities Publications (#DH2014 poster). figshare. https://doi.org/10.6084/m9.figshare.1094359.v5 Retrieved: 17:52, Aug 04, 2017 (GMT)

Priego, Ernesto (2017): Aprire l’Informatica umanistica / Abriendo las humanidades digitales / Opening the Digital Humanities. figshare. https://doi.org/10.6084/m9.figshare.4902995.v1 Retrieved: 18:00, Aug 04, 2017 (GMT)