Jan 15 2019: Gillette, subsidiary of Procter and Gamble, launched an advertising campaign on social media promoting – together with its own brand and products – a rebranded masculinity characterised by responsibility.
Just under 24 hours after this tweet was posted, the campaign was generating responses at a rate of two to three tweets a second (48 hours later this has slowed to about 30 tweets per minute). The search that returned this rate of responses, entered into Twitter’s search engine, was “gillette OR (men AND ad) OR (masculinity AND ad)”.
This is the process I’ve been using to identify online argument to test a method of analysis developed over the Christmas holidays. A high frequency international stoush about toxic masculinity and a cynical corporate exercise – good grist for the mill.
Because this is the first substantive test of a process for network visualisation some of what follows concerns the network visualisation method applied, rather than the topic of the argument under analysis. It’s intended to write the process up as I go, in part for the sake of transparency, so anyone can assess the method’s validity (a more detailed description of the method is in this series of short posts). Future posts will contain less in this vein, more bare bones analysis. Scroll down for maps of an argument.
I don’t know in advance precisely how many tweets I’ll need to extract from twitter’s API in order to cover the period I’m interested in and what’s returned for the search I’ve used twitter to test. There’s a parameter when scraping tweets, ‘Max ID’, that lets you specify a tweet ID, and tweets collected are those posted later than the specified tweet. As Max ID I used Gillette’s account’s tweet above. There’s a gap of 12 seconds between this and the second tweet collected, after which the pace speeds up.
Initially I set the number of tweets to be collected at 50,000, but this only covered the next five hours. Easy enough, though to set the last tweet collected as a new Max ID and keep going. After four searches (it took two or three hours, because there’s a delay imposed after every 18,000 tweets) the API caught up with the present, so I stopped there.
The total data-set is nearly a quarter of a million tweets (it does include retweets) from a period of about twenty-three hours. It incorporates tweets from 160,000 twitter accounts, and a total of 360,000 connections between accounts (meaning that one account retweeted, replied to, quoted, or mentioned another account).
This research method is recently developed (the specific workflow I’m using, that is, drawing on established approaches); the next stage of testing is to scrape and load up a series of data-sets. This is, first, to hone and hopefully accelerate the time-intensive process of making graphs from tables of data in the visualisation software Gephi (this part doesn’t happen automatically). Second, it’s getting to the seriously interesting phase of generating and looking at graphs, with various content, to see what intel graphs of different kinds yield: what sorts of things can be found out.
Twitter relations network
Most obvious from a network visualisation of this sort is which twitter accounts are prominent (and perhaps influential) in the network. That Gillette occupies the largest node isn’t news, not least because the name of the account was the first search term. Many of the other most-connected accounts in the discussion are railing against Gillette, with a proposed boycott a major theme, so that this tweet’s borne out empirically:
WOMEN: Treat us like humans.— Solomon Georgio (@solomongeorgio) January 16, 2019
MEN: You’re being sensitive and irrational.
GILLETTE: Men, we can be better.
MEN: *PTERODACTYL SCREAM*
In seriousness the first impression I’ve gotten as to the utility of these graphs is that they confirm in a fairly unequivocal way – in a factual way – what’s happening in relation to some topic, on a large social media platform, all at once. It’s not speculation, or the disturbing sense of a vast horde of adversaries that can come from clicking on a hashtag and reading for a few minutes. It‘s all the tweets*, visualised.
In the twenty-two hours after one tweet was posted, so were 228,000 responses. Thirty-seven percent of these were replies, nearly matched by retweets. Isolated tweets, quote-tweets and mentions account for the rest.
A third common point of view and in discussion is cynicism as to the motives behind the corporate ad campaign: this line of thought cuts across the divide in sentiment on the issue of toxic masculinity.
I thought when I started scraping data that my ‘filter bubble’ would mean there was all sorts of discussion going on that I’d not have seen. Not so. It’s inane tirades on the one hand, grim cynicism on the other. It’s the camps you’d expect to find. But now I know it’s as expected, because this (within the parameters fixed by search terms and a recent time period) is all the tweets.
All sorts of claims are made about what’s said online. Having the evidence as to what has been said, in a form that’s readable can provide a ready basis for supporting such claims, or refuting them.
A lot of what I have in mind for this method is academic, to do with mapping parts of argument, of ideology, to look at the general forms or patterns these take. But having the data to hand that can stand as evidence of what’s gone on on a social media platform leads to the thought of practical purposes. For one, this is potentially a means of making agents accountable for what they do on a network like twitter’s – a check-in-a-minute map of an argument, or a pile-on, or a campaign etc that also shows which accounts directed traffic.
The next thought, of course, down this road, is the double-edged sword this sort of evidence represents – the risk the publication of usernames could pose, if the agents behind the account are vulnerable. If this is useful for identifying agency in a network, it could be useful in doing harm. informed extreme caution with respect to accuracy and other ethical issues
Some other observations:
1. In the visualisation above, I’ve colour-coded edges by type of connection, as the key indicates. So it’s also demonstrated that Gillette’s account received many thousands of retweets, many thousands of replies, and of mentions.
2. Those accounts that represent men’s grooming alternatives dotted around the top right sit in a pool of yellow – this indicates they are being mentioned by other accounts (not tweeting themselves). The accounts @dollarshaveclub and @harrys are proximate to @gephi because they’re being mentioned in the same tweets, the accounts responsible for these mentions, enthusiastic for a boycott, are for the most part further out to the left.
3. Bands of yellow arcing towards Gillette’s promotional account indicate that account is also being mentioned frequently. The yellow arcs come in from different directions, reflecting talk in and between different communities with starkly opposed attitudes to the ad campaign.
By and large, all of the accounts situated in the top and left-hand part of the graph are united in their condemnation of Gillette, on various points of contention. The edges that connect these accounts with Gillette’s and others that they’re taking to task, stretch right across the map. Accounts responding positively to the campaign are in the lower part of the map on the right, and it’s here that there’s most interchange between different camps.
A third common point of view and in discussion is cynicism as to the motives behind the corporate ad campaign: this line of thought cuts across the divide in sentiment on the issue of toxic masculinity, and so it’s all over the map.
4. This layout uses twitter accounts as nodes, but focuses on types of relation between twitter accounts (retweet, reply, etc.), and therefore serves as a kind of map of The Ratio for many tweets at once. Gillette’s account’s ratio is in balance due to many retweets. At the bottom of the graph farrward3 has provoked a slew of replies and is backing up her case. Over on the bottom left, @realjameswoods has many replies from accounts, but you’ll note this is offset by a fountain of retweets. North of Woods it’s wholly blue skies, any replies or mentions drowned by the volume of retweets. This presents an interesting contrast with the volume of exchanges in the bottom right of the graph. All parts of this graph are populated by agents engaged on a political matter, but behaviours are quite different in different parts of the network.
I’ve made a couple of references now to the content of the tweets in this large data-set; I’ve been reading the tweets all day, jumping between the maps, the data table, and Twitter itself. But the content’s not visible at all in the layout above. This is the flaw that countervails the virtue of the connections-focused visualisation: depiction of a large network of posts all at once.
This limitation is the starting point for the separate method that produces the following semantic representation – with the idea that two graphs together represent the best of both worlds.
Prominence for a term in this map is based on its having appeared in many tweets, and terms’ proximity to one another is also determined by the frequency of co-occurrence.
Both this graph and the one below proved to have roughly the same shape as the first graph, which I wasn’t necessarily expecting – this means that the position of terms’ nodes equates in general with the position of account nodes. If this is reproducible, or just tends to occur if the right algorithms are used in a consistent way, then it ought to be possible for someone viewing the graphs to quickly take in both the overall shape of the argument and get some idea of content.
The third representation, below, extends the idea that the qualitative data – the original text – is really a critical addition for an appreciation of the argument mapped. Another significant difference is that this graph takes tweets, not Twitter accounts as nodes. If one account posts several tweets central to the argument, each is represented individually.
This graph is the reverse of the first in its emphasis: making handles and full tweets legible necessitates taking most of the detail of the network out. The labels can be de-emphasised, though, to generate a graph of relations focused on tweets not accounts.
For this visualisation, to make individual tweets readable, most of the data-set has been excluded. What remains is less than one per cent of the original 228,000 tweets -those with the most connections.
From this set the labels of a small selection of the 0.5% of tweets are scaled so that they’re readable. In this case, it’s just the most prominent (replied to, retweeted, mentioned, and quoted) tweets in the entire data-set.
*I don’t know that it’s all the tweets, Twitter may hold some back. But I’ve not been able to identify any tweet at Twitter itself that satisfies the search criteria but is missing from the exported data-set.