layouts for network visualisation three – semantic co-occurrence network

The previous two layouts described (the account-focused and relations-focused layouts), with the handles of social media accounts as the labels, represent account networks and edge types while rather critically omitting the semantics of the exchange. Though I’m sceptical of the utility of ‘word-clouds’ by themselves, a graph of the terms most prominent in discussion ought to be a useful complement when the account network and a map of relations are also available.

To map the network semantically, data is exported in a different configuration again (and this time in text format), stripped of all but the text of messages. A program called Cowo made by Clement Lavalois (there’s a download link on this page) analyses words and phrases in the set of tweets, and exports a .gml file that can be opened directly in Gephi. Cowo searches for co-occurrence of terms in tweets, so that an edge is established any time a term appears in two or more tweets. The nodes are the individual terms (or short phrases), and the node’s degree is determined by the number of edges with other tweets.

This map of terms for discussion on socialism is heavily reflective of US political discourse related in particular to Alexandria Ocasio-Cortez and Bernie Sanders – one view from the last several representations, taken together, is that many accounts here are working hard to preserve a long-prominent concept (in that country) of socialism as poisonous.

It ought to be noted that this is a markedly more difficult sort of representation to work with (although it’s worked well in this case), by virtue of attempting to sort and display qualitative content in a quantitative way. The problems that arise are mundane but difficult – what list of ‘stop-words‘ (words to be excluded – eg ‘and’, ‘the’, ‘I’m’) to use, how many words to allow in a single phrase, what sorts of characters to exclude (eg. if you’ve excluded non-English characters, you’ll lose ï in naïve), and so on. Each of these choices could have gone otherwise, and the result would be, in each case, a slightly different formulation of the data. Some nonsensical terms usually show up in the data, but preventing this by being clinical in only allowing the presentation of ‘proper’ English terms is liable to lead to omission of a different sort.

layout four – tweet network

Leave a Reply

Your email address will not be published. Required fields are marked *