Layouts for network visualisation four – tweet network

(prior layouts are the account network, relations network, and semantic co-occurrence network layouts)

The fourth and final layout for visualisation came from looking at the third, a semantic graph of terms and phrases, and wondering why, instead or in addition to a map of terms and phrases, entire tweets can’t be shown in graphs, to combine semantic and network information. They can.

The limitation of showing entire tweets is that the labels take up more space, necessitating further parsimony in terms of a degree filter: still the most prominent tweets in a discussion could be displayed. This wishful thinking prompted another wish – to be able to select ‘exemplar’ tweets from the discussion and make them the prominent tweets displayed, rather than just those with the highest in-degree measure.

A qualitative coding package for the programming language R, called RQDA (R-qualitative-data-analysis), provides for surveying and tagging or coding lines of text in a text file, such as a data-set of tweets. Once I’d found and tried this out, I thought if only I could export the tags for particular tweets in a new column with the original data, I’d have the codes as the basis for selecting certain tweets by attribute for display.

I haven’t had much coding experience in any language before starting off processing twitter data in R – in the early days of my PhD candidacy I built a website and a content management system for parsing notes and bibliography in php. Working in R I’ve taken this approach: if I can think of a way a computer programme ought to be able to do something – a natural language algorithm for processing data in a particular way – then I can reasonably assume that method had been realised by someone, so I can search for and then copy it.

As it turned out there were two problems with processing this layout that were substantially more difficult than those faced in developing the engine originally for extracting and parsing tweets. The first was reintegrating an html file (exported from RQDA) into the data tables.

The second seems more trivial but wasn’t: wrapping text in the graph display, either for preview or export. Gephi doesn’t at the present time offer this feature natively. Fortunately, a Gephi user, Peter A Jansen, had produced a ‘hacky’ version of Gephi that did wrap text on <br> tags in one exporter and in preview, and had posted it to Github. Getting this running required opening, altering, and then recompiling Gephi in Python using a GUI-equipped package called Netbeans… anyway it was a foray into a third programming language that produced a version of Gephi that wrapped text. Now an entire post, formatted by design, could be displayed as the label for a node, like this:

This last layout distinctly broadens the possibilities for displaying account and semantic information at the same time, while constraining the number of nodes that can be visualised at once . As planned, the tweets displayed can be those most prominent (with the highest in-degree or out-degree, for example), or alternately a set of exemplars selected during qualitative analysis and coding in RQDA.

Again, a variety of sub-layouts are possible, changing parameters in the graphing software, but not altering the data. The simplest of these again are details from the full graph, allowing entire threads of replies to be represented when they exemplify broader debate.

The attention-intensive nature of the process of graphing a data-set of tweets seems a virtue rather than a hindrance. The nuts and bolts of analysis involves sifting through data, in broad terms with the aim of identifying patterns in it.  Preparation of a graph entails just this sort of close observation. The finished representation of an argument ought to be informative on brief inspection.  But no such single image is anything like a match for the picture that emerges digging around in the data for an extended time, with the advantages of zooming in and out without loss of resolution, and changing the graph parameters to highlight different phenomena.

Prior layouts:

Layout one – accounts network

Layout two – relations network

Layout three – semantic co-occurrence network

Leave a Reply

Your email address will not be published. Required fields are marked *