If you haven’t been there already, please start here. An introduction and a detailed methodological note provide context to this post.

I have now shared a spreadsheet containing an archive of 1,005 @StrongerIn Tweets publicly published by the queried account between12/06/2016 13:34:35 and 21/06/2016 13:11:34 BST.

The spreadsheet contains four more sheets containing a data summary from the archive, a table of tweets’ sources, and tables of corpus term and trend counts and collocate counts.

This will hopefully allow to compare two similar samples from the output of two homologous Twitter accounts, both officially representing the ‘Leave’ and ‘Remain’ sides of the UK EU Referendum. The collected period is the same and if desired it is possible to edit the sets to have for example 1,000 Tweets each.

Following the structrue of my previous post on the ‘Vote Leave‘ dataset, here’s some quick insights from the @StrongerIn account for comparison.

Archive (from:StrongerIn)

Number of links 735
Number of RTs 409 <-estimate based on occurrence of RT
Number of Tweets

1005

Unique tweets 1004 <-used to monitor quality of archive
First Tweet in Archive

12/06/2016 13:34:35

BST
Last Tweet in Archive

21/06/2016 13:11:34

BST
In Reply Ids

9

In Reply @s 0
Tweet rate (tw/min)

0.1

Tweets/min (from last archive 10mins)

Like the @vote_leave account, @StrongerIn is used for mainly broadcasting Tweets and no @ Replies to users were collected during the period represented in the dataset.

Though this dataset, collected over slightly different timings but covering the same number of days, contains 60 fewer Tweets than the Vote Leave one; this @StrongerIn dataset reflects the account shared 235 links more than its @vote_leave counterpart.

Sources

Unlike @vote_leave, the dataset does not indicate that @StrongerIn uses Buffer nor Twitter for iPhone. However TweetDeck (423) and the Twitter Web Client (591) appear as the main sources. There’s even an interestingly strange Tweet, linking to a StrongerIn 404 web site page, published from NationBuilder.

Source Count
Nationbuilder

1

TweetDeck

413

Twitter Web Client

591

Total

1,005

Most Frequent Words

Removing Twitter data-specific stopwords from the raw data (e.g. t.co, amp, rt) the 10 most frequent words in the corpus are:

Term Count Trend
eu

287

0.013906387

remain

224

0.010853765

bbcqt

216

0.01046613

europe

209

0.01012695

vote

170

0.008237232

strongerin

167

0.00809187

uk

159

0.0077042347

jobs

148

0.0071712374

leave

148

0.0071712374

eudebate

113

0.0054753367

Compare them with the 10 most frequent words in the vote_leave data. Anything interesting?

 Let’s compare the top 10 terms from each account side by side:

 

Top 10 Terms in 1,100 vote_leave Tweets over 7 days vote_leave count Top 10 Terms in 1,005 StrongerIn Tweets over 7 days StrongerIn count
voteleave 558 eu 287
eu 402 remain 224
bbcqt 398 bbcqt 216
gove 165 europe 209
takecontrol 146 vote 170
immigration 133 strongerin 167
control 95 uk 159
cameron 89 jobs 148
turkey 84 leave 148
uk 72 eudebate 113

The terms in red are those appearing in both datsets; the terms in blue correspond to the name of each campaign. It’s interesting that though the StrongerIn account has 182 fewer mentions of ‘bbcqt’ (bear in mind the StrongerIn dataset has 95 fewer Tweets), ‘bbqt’ remains in third place on both sets.

The differences between the ranking of mentions of each campaign’s name are noticeable; as well as the fact that the vote_leave campaign has the name of the Prime Minister (himself a Remain campaigner) in its top 10 (as well as that of Gove; a Leave campaigner), while StrongerIn has no names of politicians on its 10 most frequent words.

There are other potentially interesting or noticeable differences when we compare these two top 10s. Can you spot them?  Do they tell us anything or not?

Digging into data and creating datasets does not necessarily tell us new things, but it does allow us to pinpoint otherwise moving objects. We don’t need to pin butterflies to recognise they are indeed butterflies, but the intention is to create new settings for observation.

References

González-Bailón, S., Banchs, R.E. and Kaltenbrunner, A. (2012) Emotions, Public Opinion and U.S. Presidential Approval Rates: A 5 Year Analysis of Online Political Discussions. Human Communication Research 38 (2) 121-143.

González-Bailón, S. et al (2012) Assessing the Bias in Communication Networks Sampled from Twitter (December 4, 2012). DOI: http://dx.doi.org/10.2139/ssrn.2185134

Hawksey, M. (2013) What the little birdy tells me: Twitter in education. Published on November 12, 2013. Presentation given from the LSE NetworkED Seminar Series 2013 on the use of Twitter in Education. Available from http://www.slideshare.net/mhawksey/what-the-little-birdy-tells-me-twitter-in-education [Accessed 21 June 2016].

Priego, E. (2016) “Vote Leave” Looking Into a Sample Archive of 1,100 vote_leave Tweets. 21 June 2016. Available from https://epriego.wordpress.com/2016/06/21/vote-leave-looking-into-a-sample-archive-of-1100-vote_leave-tweets/. [Accessed 21 June 2016].

Priego, E. (2016) “Vote Leave” A Dataset of 1,100 Tweets by vote_leave with Archive Summary, Sources and Corpus Terms and Collocates Counts and Trends. figshare. URL: DOI: https://dx.doi.org/10.6084/m9.figshare.3452834.v1

Priego, E. (2016) “Stronger In” A Dataset of 1,005 Tweets by StrongerIn with Archive Summary, Sources and Corpus Terms and Collocates Counts and Trends. figshare. DOI:
https://dx.doi.org/10.6084/m9.figshare.3456617.v1

[StrongerIn]. (2016). [Twitter account].Retrieved from https://twitter.com/StrongerIn. [Accessed 21 June 2016].

[vote_leave]. (2016) [Twitter account]. Retrieved from https://twitter.com/vote_leave. [Accessed 21 June 2016].