IFLA World Library and Information Congress 82nd IFLA General Conference and Assembly 13–19 August 2016, Columbus, Ohio, USA
IFLA World Library and Information Congress
82nd IFLA General Conference and Assembly
13–19 August 2016, Columbus, Ohio, USA. Copyright by IFLA, CC BY 4.0.

 

The first part of this series provides necessary context.

I have now an edited list of the top 50 most frequent terms extracted from a cleaned dataset comprised of 10,721 #WLIC2016 Tweets published by 1,760 unique users between Monday 15/08/2016 10:11:08 EDT and Wednesday 17/08/2016 07:16:35 EDT.

The analysed corpus contained the raw text of the Tweets (includes RTs), comprising 185,006 total words and 12,418 unique word forms.

Stop words were applied as detailed in the first part of this series, and the resulting list (a raw list of 300 most frequent terms) was further edited to remove personal names, personal Twitter user names, common hashtags, etc.  Some organisational Twitter user names were not removed from the list, as an indication of their ‘centrality’ in the network based on the frequency with which they appeared in the corpus.

So here’s an edited list of the top 50 most frequent terms from the dataset described above:

Term Count
library

1379

libraries

1102

librarians

811

session

715

privacy

555

wikipedia

523

make

484

copyright

465

people

428

digital

378

access

375

use

362

public

340

data

322

need

319

iflabuild2016

308

world

308

information

298

internet

289

new

272

great

259

indigenous

255

iflatrends

240

report

202

knowledge

200

future

187

work

187

libraryfreedom

184

literacy

184

space

180

change

178

thanks

172

oclc

171

open

170

just

169

books

168

trend

165

important

162

info

162

know

162

social

161

net

159

neutrality

159

wikilibrary

158

collections

157

working

157

librarian

154

online

154

making

149

guidelines

148

Is this interesting? Is it useful? I don’t know, but I’ve enjoyed documenting it. Reflecting about different criteria to apply stop words and clean, refine terms has also been interesting.

I guess that deep down I believe it’s better to document than not to, even if we may think there should be other ways of doing it (otherwise I wouldn’t even try to do it). Value judgements about the utility or insightfulness of specific data in specific ways is an a posteriori process.

I hope to be able to continue collecting data and once the congress/conference ends I hope to be able to share a dataset with the raw (unedited, unfiltered) most frequent terms in the text from Tweets published with the event’s hashtag. If there’s anyone else interested they could clean, curate and analyse the data in different ways (wishful thinking but hey; it’s hope what guides us.).