In two days the United Kingdom will be voting in a Referendum that is very likely to change its destiny. More importantly, it is likely to change the destiny of everyone else who has a relationship with the UK.

This is a political event that is not only of national, internal or local interest, but one that is likely to have direct and immediate repercussions well beyond its borders. If one has ever lived in one of the EU member countries recently one does not need to be a political scientist to feel that these repercussions will not only be of a merely economic nature– already, even before the vote is cast, the UK’s social tissue has been undoubtedly transformed and deeply, even tragically affected.

Needless to say one of the arenas where political activity is taking place is on the media (TV, Radio) and social media. As the date to vote in person approaches, I collected and shared a dataset of tweets published by the official Leave campaign Twitter account, @vote_leave, between 12/06/2016 09:06:22 – 21/06/2016 09:29:29 BST. The dataset contains 1,100 tweets.

I did a quick text analysis of the Tweets themselves to get a quick insight into the most frequent terms and collocates in the corpus, and also looked at the tweets’ sources (the services used to publish the Tweets, i.e. the Twitter Web Client, Buffer, the Twitter iPhone app).

Some quick insights from the data:

Archive Summary (from:vote_leave)

Number of links 500
Number of RTs 592 <-estimate based on occurrence of RT
Number of Tweets 1100
Unique tweets 1099 <-used to monitor quality of archive
First Tweet in Archive

12/06/2016 09:06:22

BST
Last Tweet in Archive

21/06/2016 09:29:29

BST
Tweet rate (tw/min) 0.1 Tweets/min (from last archive 10mins)
In Reply Ids 3
In Reply @s 2
@s 90
RTs 54%

It is interesting that the account mostly broadcasts and RTs Tweets, but does a minimal interaction with other users via Reply @s, at least according to this sample dataset. (A larger dataset could corroborate or not if that is a trend indicating a media/content strategy or not).

Sources

The data indicates that most Tweets are published from the Twitter Web Client (496!), which I would have thought any marketing professional would find clunky if not really unfit for purpose.

Not suprisingly however Buffer is used (411 buffered Tweets), which indicates they are likely to have been scheduled in advance. Surprisingly for me, most of the Tweets in the dataset did not have TweetDeck as a source (only 4 according to the collected data in the given period), but it is possible that TweetDeck was used to ‘buffer’ the Tweets, as TweetDeck allows for Buffer integration.

Twitter for iPhone emerges as a significant source, well above Tweetdeck. Personally, I picture such an important political campaigning being done from a mobile phone as kind of scary. Influencing a nation’s destiny from the train home after the pub!

Source Count
Tweetdeck

4

Buffer

411

Twitter for iPhone

189

Twitter Web Client

496

Total

1100

Most Frequent Words

I was not surprised to see that ‘immigration’ was one of the most frequent words appearing in the corpus. However it was interesting to see the centrality of the hashtag ‘bbcqt’ (BBC Question Time). Even if we take into account the specific context of the data’s time period, the prevalence of bbcqt as a term in the corpus could be potentially interpreted as an indication of the importance that television, and specifically the BBC, has had in defining voting trends and public discourse regarding the Referendum.

Removing Twitter data-specific stopwords from the raw data (e.g. t.co, amp, rt) the 10 most frequent words in the corpus are:

Term Count Trend
voteleave

558

0.026160337

eu

402

0.018846694

bbcqt

398

0.018659165

gove

165

0.0077355835

takecontrol

146

0.0068448195

immigration

133

0.0062353495

control

95

0.004453821

cameron

89

0.0041725268

turkey

84

0.003938115

uk

72

0.0033755274

(voteleave, bbcqt, takecontrol were hashtags).

It is not clear how much of a social media/content strategy might be behind a Twitter account like @vote_leave, nor how many account managers are behind the tweetage. Apart from the obvious prevalence of ‘immigration’ as a term, it is nevertheless interesting to see that in 8 days of Tweets in the final countdown to the Referendum there would be a clear interest in tapping into televised debate and influence (bbcqt), to the point that the term would get such a high ranking. Bear in mind that ‘voteleave’ is their standard campaign hashtag, and that ‘eu’ would be expected to be a very frequent word, to the point that it could be considered a stop word in the specific context of this corpus. Perhaps for all the onus on social media as an autonomous medium it is still traditional mainstream media, in this case the BBC, which has the greatest influence in public opinion?

Notes on Methodology

The Tweets contained in the Archive sheet were collected using Martin Hawksey’s TAGS 6.0.

The text analysis was performed using Stéfan Sinclair’s & Geoffrey Rockwell’s Voyant Tools.

The collection and analysis of the dataset complies with Twitter’s Developer Rules of the Road.

The data was collected as an Excel spreadsheet file containing an archive of 1,100 @vote_leave Tweets publicly published by the queried account between 12/06/2016 09:06:22 – 21/06/2016 09:29:29 BST.

I prepared a spreadsheet and added four more sheets to add a data summary from the archive, a table of tweets’ sources, and tables of corpus term and trend counts and collocate counts.

It must be taken into account this is just a sample dataset containing the tweets published during the indicated period and not a large-scale collection of the whole output. The data is presented as is as a research sample and as the result of an archival task. The sample’s significance is subject to interpretation.

Please note that both research and experience show that the Twitter search API is not 100% reliable. Large Tweet volumes affect the search collection process. The API might “over-represent the more central users”, not offering “an accurate picture of peripheral activity” (González-Bailón, Sandra, et al. 2012). Therefore it cannot be guaranteed the dataset contains each and every Tweet actually published by the queried Twitter account during the indicated period. [González-Bailón et al have done very interesting work regarding political discussions online and their work remains an inspiration].

Only content from public accounts was included and analysed. The data was obtained from the public Twitter Search API. The analysed data is also publicly available to all Twitter users via the Twitter Search API and available to anyone with an Internet connection via the Twitter and Twitter Search web client and mobile apps without the need of a Twitter account.

Each Tweet and its contents were published openly on the Web, they were explicitly meant for public consumption and distribution and are responsibility of the original authors. Any copyright belongs to its original authors.

No Personally identifiable information (PII), nor Sensitive Personal Information (SPI) was collected nor was contained in the dataset.

I have shared the dataset including the extra tables as a sample and as an act of citizen scholarship in order to archive, document and encourage open educational and historical research and analysis. It is hoped that by sharing the data someone else might be able to run different analyses and ideally discover different or more significant insights.

For the next post on this series, click here.

References
[vote_leave]. (2016) [Twitter account]. Retrieved from https://twitter.com/vote_leave. [Accessed 21 June 2016].

González-Bailón, S., Banchs, R.E. and Kaltenbrunner, A. (2012) Emotions, Public Opinion and U.S. Presidential Approval Rates: A 5 Year Analysis of Online Political Discussions. Human Communication Research 38 (2) 121-143.

González-Bailón, S. et al (2012) Assessing the Bias in Communication Networks Sampled from Twitter (December 4, 2012). DOI: http://dx.doi.org/10.2139/ssrn.2185134

Hawksey, M. (2013) What the little birdy tells me: Twitter in education. Published on November 12, 2013. Presentation given from the LSE NetworkED Seminar Series 2013 on the use of Twitter in Education. Available from http://www.slideshare.net/mhawksey/what-the-little-birdy-tells-me-twitter-in-education [Accessed 21 June 2016].

Priego, E. (2016) “Vote Leave”. A Dataset of 1,100 Tweets by vote_leave with Archive Summary, Sources and Corpus Terms and Collocates Counts and Trends. figshare. URL: DOI: https://dx.doi.org/10.6084/m9.figshare.3452834.v1

Priego, E. (2016) “Stronger In”. A Dataset of 1,005 Tweets by StrongerIn with Archive Summary, Sources and Corpus Terms and Collocates Counts and Trends. figshare.
https://dx.doi.org/10.6084/m9.figshare.3456617.v1

Priego, E. (2016) “Stronger In”: Looking Into a Sample Archive of 1,005 StrongerIn Tweets. 21 June 2016. Available from https://epriego.wordpress.com/2016/06/21/stronger-in-looking-into-a-sample-archive-of-1005-strongerin-tweets/. [Accessed 21 June 2016].