Analysis of the hashtag data suggests that a coordinated effort to manipulate the platform may have led the hashtag to trend.
On November 12, 2021, conservative influencers began using the hashtag “RacistJoe.” The hashtag ultimately appeared in trending topics on Twitter. Our data analysis strongly suggests that the trend was at least partially the consequence of platform manipulation. Behavior similar to what we observed in the “RacistJoe” data has been reported in the past as an attempt to influence public opinion and media reporting. The heightened engagement, brought about through coordinated activity, games the algorithm and results in more people seeing the boosted content, sometimes reaching users who do not already follow the tweet authors.
An examination of users retweeting two or more of the RacistJoe-related tweets analyzed by Hoaxlines revealed traits associated with platform manipulation. That the misleading campaign trended on Twitter, drowning out the most relevant voices like that of the President of the Negro Leagues Museum, demonstrates how a small group could effectively silence opposition.
At least two users in the dataset appeared to target the username “POTUS.” Targeting the US President may not achieve the goals ordinarily associated with social media harassment, which are “to silence members of civil society, muddy their reputations, and stifle the reach of their messaging,” according to the Mozilla Fellow Odanga Madung. Still, Hoaxlines documented the findings for potential research value. Twitter platform manipulation policies already address many of the problematic behaviors observed by Hoaxlines, but enforcement is inconsistent, leaving the United States vulnerable to domestic and foreign manipulation alike.
Videos and articles spread alongside the hashtag #RacistJoe in response to an address from U.S. President Joe Biden. The full video clip — which includes a gaffe that initially sounded like Biden intended to call Satchel Paige “a great Negro” — went viral on November 11, 2021. Biden has publicly struggled with a stutter, and that appears to have played a role in his reference to Paige, an iconic player in the Negro Leagues who later played in the majors.
Verbatim, Biden said:
“I’ve adopted the attitude of the great negro at the time–pitcher in the Negro Leagues–who went on to become a great pitcher in the pros-in major league baseball after Jackie Robinson. His name was Satchel Paige.”
The raised hand and pause, which can be seen on the video, followed by a correction, suggests that Biden did not intend to refer to Paige as “a great negro,” but as a great “pitcher in the Negro Leagues.” Verbal stumbles are common among those with a stutter, a neurological disorder that can constitute a significant hardship.
Americans exploiting Biden’s condition are following a mold set by adversaries in 2020. Foreign state-controlled media used stutter-related gaffes to spread disinformation about Biden’s mental health in the months leading up to the 2020 election, a fact that was withheld without just cause from the public for months.
RT, an outlet the US National Intelligence Council classifies as Russian state-controlled media, which has faced criticism for (among other things) recent German election interference, quickly published an article echoing the sentiments of a domestic provocateur:
Associated Press (AP) reported on one of the more glaring mischaracterizations of Biden’s verbal stumble in the article “Fox News edit of Biden comment removes racial context.” AP admonished Fox, saying:
When editing video, journalists have an obligation to keep statements in the context they were delivered or explain to viewers why a change was made, said Al Tompkins, a faculty member at Poynter Institute, a journalism think tank. In this case, the edit is not at all clear, he said.
The President of the Negro League Museum, Bob Kendrick, retweeted in response to #RacistJoe trending:
To determine the significance of a tweet with the hashtag, we looked at two elements:
The number of incoming links, which is reflected in the size of the node.
Which nodes were identified as “influencers.” The graph shows influencers in magenta.
To determine this, Hoaxlines used a process — also known as a heuristic — that considers a user’s authority, how much the user matters in connecting different groups within a network, and to what extent the node would be able to shift the opinion of others.
Put another way, influencers are well-connected users who can get others to change. That is one part of influence. The other is who an influential node could synergize with to maximize the effects they have on others. This is unsurprisingly known as “influencer maximization.”
Think of maximization like one superhero versus the roster that makes up the Avengers. It’s only a couple more people, but their combined ability is greater than the total reached by adding each person’s power alone. For more about identifying influencers, read this.
Figure 1 shows the network on November 12, the day after the gaffe, when the trend was still growing. As the number of nodes — here “nodes” describe Twitter accounts — mentioning or interacting with mentions of “RacistJoe” increases, meaning more Twitter accounts appear in the data and the network size increases. When the number of nodes or users increases in a network, we expect the importance of any single user to go down.
We identified the influencers while the hashtag had not yet peaked. In an organic network, the importance of any single user should decline as the network size increases. On November 12, there were a little over 2000 nodes. We identified the key tweets by tallying incoming links from other accounts interacting with the tweet via a retweet, like, or quote. We then sized the nodes based on the incoming links, or “indegree centrality,” as it is known.
Past studies examining the spread of misinformation by social bots have reported this phenomenon. A 2018 study published in Nature, “The spread of low-credibility content by social bots,” reported differences in network inauthentically spreading an article:
Other examples of this can be seen in reports on state-affiliated information operations and targeted disinformation campaigns by domestic actors alike. The structure of a network, the relationships between coordinated actors, and the behavior researchers observe can be remarkably similar among networks used for boosting false and misleading claims.
Hoaxlines found that activity mostly centered around a handful of tweets. Below are examples of tweets that were heavily engaged by other accounts:
We also found a series of tweets by searching top tweets for “RacistJoe” on Twitter. Although these tweets did not use the hashtag, many contained a video that was shared repeatedly by people who had also tweeted about “racist” and “Joe.” This illustrates how a single hashtag may not cover the scope of a particular idea, network, or operation.
In the case of “RacistJoe”-related posts, many popular tweets conveyed some version of the idea “Biden is racist” or “Biden called Satchel Paige a negro,” but did not contain the hashtag.
To understand whether any potential network was confined to tweets with the hashtag we compared users retweeting #RacistJoe to users retweeting tweets that did not include the hashtag but still surfaced in Twitter searches for “RacistJoe,” likely because of replies containing the hashtag.
Looking at individual accounts plays a role in assessing authenticity but network analysis often provides more objective evidence because it relies less on subjective impressions. Additionally, as social bots have grown more convincing in appearance researchers have moved toward network-level detection, which overcomes the problem of convincing fake accounts and is also better suited for large-scale detection. Ultimately, network-level evidence comes from the account behavior. Behavior — not whether an account is run by a person or a bot — determines whether or not a group of people manipulate a platform.
Indegree (incoming links) and outdegree (outgoing links) centralities tell who is retweeting, liking, replying, and quoting (outdegree) and who benefited from that activity (indegree). Hoaxlines looked at users retweeting five popular RacistJoe-related tweets and found a group of 646 users had retweeted two or more tweets. A data visualization below shows how many accounts retweeted each tweet. A node that is linked to both ElectionWiz and Lavern_Spicer, for example, indicates that the user retweeted ElectionWiz and Lavern_Spicer.
The color key under “outgoing retweets” tells how many times a user retweeted one of the key tweets. Absent from this visualization are the users that retweeted all five. We will discuss them further in the next section.
Hoaxlines observed a skew toward recent creation dates for accounts retweeting two or more RacistJoe-related tweets. The three accounts in Figure 7 retweeted all five #RacistJoe-related tweets. Creation dates for these users were September 2021, March 2020, and October 2021. Having a recent account, however, is not sufficient reason to say an account is more likely to be inauthentic than genuine.
Users in this 646-person subset were disproportionately likely to have creation dates in 2020 and 2021. Researchers have observed that accounts used for platform manipulation tend to include a higher number of recently created profiles.
One study analyzed creation dates for users involved in known information operations from the United Arab Emirates, Spain, the Russian Internet Research Agency, Venezuela, Iran, and China. In five out of the six data sets, researchers found that account creation dates skewed toward recent years. (Readers may understand “recent” to mean within the last one to two years.) A skew toward recent creation dates has also been observed in Black Lives Matter (BLM) hashtag data, which is regrettably unsurprising because domestic and foreign groups have heavily targeted BLM since its inception.
Platform manipulation efforts frequently contain a higher than expected number of recently created users, and this lean toward recent accounts appears in information operations across time and regardless of what group is behind the efforts.
Critically, these traits exist together in multiple users that Hoaxlines found in the “RacistJoe ” network. One cannot conclude an account is more likely to be inauthentic based on anonymity alone. In authoritarian countries, identifying information might cost an activist his or her life. Where any single trait is not enough to draw a conclusion, finding that an account has trait after trait increases the chances it’s a fake.
Hoaxlines concluded the accounts (Figure 6) that retweeted all five RacistJoe-related tweets were more likely to be inauthentic than genuine after we found other indicators like a narrow set of views and characteristics not found in groups of genuine accounts. The profiles lacked identifying information, as is common among accounts created to manipulate social media platforms and influence people.
One account that retweeted both ElectionWiz and DiscloseTV had a daily tweet count that exceeded 1000 tweets in a single day, suggesting a human is unlikely to be responsible. We also found this user tweeted at a rate that broke 150 tweets per hour and almost exclusively retweeted, with a retweet rate over 99%.
First Draft advises that more than 100 tweets per day may indicate an account is at least partially automated, but that is the upper limit of estimates from respected research organizations. DataJournalism.com wrote of account activity:
Researchers at the Oxford Internet Institute’s Computational Propaganda Project classify accounts that post more than 50 times a day as having “heavy automation.” The Atlantic Council’s Digital Forensics Research Lab considers “72 tweets per day (one every ten minutes for twelve hours at a stretch) as suspicious, and over 144 tweets per day as highly suspicious.”
Another user that retweeted Lavern_Spicer, DiscloseTV, and ForAmerica had an exceptionally high retweet rate and tweeted over 600 times in a single day on November 6. The account also broke 125 tweets per hour recently, or a rate of more than two tweets per minute.
A third user that retweeted ElectionWiz, ForAmerica, and PapiTrumpo regularly broke 100 tweets per hour and had a retweet rate of over 90%. Hoaxlines found over 800 tweets in a single day from this account and the following recently used hashtags: #RacistJoe (7), #IStandWithSteve (4), #KyleRittenhouse (4), #KyleRittenhouseTrial (4), #LetsGoBrandon (4), #COVID19 (3), #FJB (3), #RINO (3), #RacistJoeBiden (3), #Rittenhouse (3).
The singular focus on hashtags promoted by right-wing influencers matches one of four universal traits — a sole tweeting purpose — reported in a large-scale study (2021) examining the bot-like traits over time with multiple datasets from different studies.
Hoaxlines also created a word cloud from the 646-user retweet group using profile bios to see which terms were in profiles and how frequently. The most common phrase was “not set,” the default setting. Although real users sometimes leave their bios blank, blank profiles occur much more frequently with accounts used for platform manipulation.
Below are examples of highly active users — defined as users ranking among the 100 most active accounts in the #RacistJoe dataset — that we found engaging RacistJoe-related content.
Inflating engagement of tweets through retweets or replies creates the illusion of popularity. Bad actors can leverage the illusion to direct public discussion, influence opinion without disclosing financial backing, or it can be used to misinform or to discourage opponents through targeted harassment. TrendMicro, a global cybersecurity firm, wrote of the threat in 2017 (p 74):
Careful and extended use of propaganda can shift the Overton window. Prolonged use of opinion manipulation techniques can make the public receptive to ideas that would have previously been unwelcome at best, and perhaps even offensive at worst. The concept of the slippery slope applies: once an opinion has been changed a bit, it becomes easier to change it even more.
This allows for bigger changes in public opinion than before, making it easier to achieve larger goals of public opinion manipulation.
Concerning a case where coordinated Twitter campaigns targeted civil advocates, the activists said, “They now self-censor on the platform.” An in-depth investigation by Mozilla included interviews with the influencers who had accepted payments to partake in the information operation.
“They were told to promote tags — trending on Twitter was the primary target by which most of them were judged. The aim was to trick people into thinking that the opinions trending were popular — the equivalent to ‘paying crowds to show up at political rallies,’ the research says.”
The problem of platform manipulation is not what opinion someone has, but who has the privilege of amplification. Who gets to hold the microphone and speak to a crowded room? Currently, bad actors are exploiting social media algorithms to increase their reach and influence, and many times this involves bot-like accounts. Bots are permissible on Twitter, so the fact that an account is automated is not a problem.
It is when platform manipulation or automated accounts are used in deceptive ways that they are not permissible according to Twitter. Enforcing these policies is critical. The Office of the Director of National Intelligence declassified a report in August of 2021 that conveys deep concern:
“Russia presents one of the most serious intelligence threats to the United States, using its intelligence services and influence tools to try to divide Western alliances, preserve its influence in the post-Soviet area, and increase its sway around the world, while undermining US global standing, sowing discord inside the United States, and influencing US voters and decision making.”
And “Cyber threats from nation-states and their surrogates will remain acute. Foreign states use cyber operations to steal information, influence populations, and damage industry, including physical and digital critical infrastructure. Although many countries and non-state actors have these capabilities, we remain most concerned about Russia, China, Iran, and North Korea. Many skilled foreign cybercriminals targeting the United States maintain mutually beneficial relationships with these and other countries that offer them safe haven or benefit from their activity.”
With no accountability and full anonymity, unfettered platform manipulation poses a grave threat that could be leveraged at a critical time, such as during a natural disaster or a pandemic. In a worst-case scenario, social media could be weaponized against the United States, threatening national security at a critical time, but only with platform complicity does this potentiality exist.
Hoaxlines collected the data via Netlytic, a platform designed for academic, social media research, using the search query “RacistJoe,” from November 11 to 15. Once collected, we ran a “communication network discovery” and selected the following ties for the data visualization:
User A replied to User B
User A quoted User B
User A retweeted User B
User A mentioned User B (for ‘original’ tweets only)
We then downloaded the GEXF file and uploaded it to Polinode. Hoaxlines obtained Botometer scoring, a machine learning algorithm that was trained using confirmed bot datasets assesses the probability an account is bot-like, for the 100 most active accounts by taking tweet ID numbers from the RacistJoe dataset collected via Netlytic and entering them for collection by Communalytic, a Netlytic-related platform for data-based social media research.
Communalytic automatically assesses a dataset based on the outgoing (outdegree) links and ranks, in descending order, the top 100 most active accounts. Using the Botometer API, researchers can run the top 100 accounts to receive scoring from 0-1. Any score over 0.5 indicates Botometer perceives the account as more likely to be inauthentic than genuine.
Contact and requests
Please write editor@novel-science[.]com with “data request” in the subject from an institutionally affiliated email address with your name, affiliation, and the reason you’d like to obtain the data.
Media can write to the same address but please put “media inquiry” in the subject.
Funding and conflicts of interest declaration
Hoaxlines generates no profit and, as yet, has accepted no donations. Study volunteers footed the cost for the study, and Hoaxlines has not provided financial or non-financial compensation. Hoaxlines is a disinformation database run out of Novel-Science.com, a science communication project devoted to media manipulation and unethical influence. It has no patrons, funding, or institutions to which it must answer. We report what we find and attempt to check our existing biases.
This study is unaffiliated with any other organization or institution regardless of author scholarship or employment. The research and conclusions represent Hoaxlines alone. The views expressed in this report reflect sincere author impressions at the time of the study.
The part(ies) responsible for the probable platform manipulation in this report are not addressed and cannot be assessed based on the information in this report alone. The inclusion of a name does not imply that the party was aware of or partook in platform manipulation. Users have no control over who mentions them, so the direction and nature of an edge can help indicate the degree of participation, an aspect not addressed by this report.
November 12 data
Network Name: RacistJoe – November 12
Network Type: Directed
Visible Nodes: 2,183
Visible Edges: 2,386
Average Total Degree: 2.19
Network Density: 0.05%
Avg Path Length: 1.06
November 15 data
Network Name: RacistJoe Hashtag – Nov 15
Network Type: Directed
Visible Nodes: 3,512
Visible Edges: 3,910
Average Total Degree: 2.23
Network Density: 0.03%
Average Path Length: 1.07
Table of Contents
Advanced Communities: This is an advanced version of the Louvain Communities algorithm (see below). The resolution parameter allows you to tune the size of the resulting communities – a resolution of 0 will place all nodes into a single community, whereas a resolution of two times the number of nodes will result in each node being assigned its own community.
The default value of 1 will apply the algorithm without regard to the size of the resulting communities. There is also an optional maximum community size parameter that, if used, will limit the resulting communities to be no larger than the inputted value. For large networks, the algorithm is considerably faster with max size left blank. Read more here (opens new window).
Average Neighbor Degree: Average Neighbor Degree for a node is the average number of edges (i.e., degree) that a node’s neighbors have. For directed networks, you can specify whether to use in-degree or out-degree for each source and target node in the calculation. Read more here (opens new window).
Betweenness Centrality: Betweenness Centrality for a node is the total number of shortest paths that pass through that node, and if the Normalized option is selected, divided by the total number of shortest paths in the network. It measures how much a node is a ‘bridge’ between other nodes in the network. Read more here (opens new window). Betweenness can be computationally expensive to calculate, particularly for large networks, which is why the option to sample a subset of nodes is provided as an input. If Apply Edge Weights is set to Yes then the inverse of the edge weights will be used such that a larger edge weight effectively reduces the distance between two nodes rather than increasing it.
Binary Flag: Binary Flag is a helper metric that is equal to True for a node if for the selected attribute below is equal to one of the selected values below for that node. Otherwise, it is equal to False. It is particularly helpful when used together with the External edge metric.
Brokerage: Brokerage here refers to Gould-Fernandez brokerage. Given an attribute, there are five kinds of brokerage possible: Coordinator, Consultant, Gatekeeper, Representative, and Liaison. This metric will count up and return the number of times that a node acted in each of those roles. Read more here (opens new window).
Closeness Centrality: Closeness Centrality for a node is the reciprocal of its farness. The farness of a node is the sum of its shortest path distances from all other nodes. The greater a node’s Closeness Centrality relative to other nodes, the closer it is on average to other nodes in the network. Read more here (opens new window).
Clustering: The Clustering coefficient for a node is the fraction of possible triangles through that node that actually exists. The higher a node’s clustering coefficient, the more embedded it is in the overall network. Read more here (opens new window).
Connected Components: A Connected Component is a set of nodes that are connected to each other. A directed network will be treated as an undirected network for the calculation of Connected Components. Read more here (opens new window).
Constraint: Constraint is related to the concept of structural holes and measures the extent to which a node is able to take advantage of structural holes in its network. Constraint will be higher if a node’s connections are highly connected to each other, either directly or indirectly through a mutual connection. Read more here (opens new window).
Core Number: Core Number for a node is the largest value k of all k-cores containing that node where a k-core is the largest possible subgraph in the network containing nodes with a Total Degree of k or more. Core Numbers can be helpful in the decomposition of large networks. Read more here (opens new window).
Current Flow Closeness Centrality: Current Flow Closeness Centrality is similar to regular Closeness Centrality but instead of a shortest path measure for distance, effective resistance inspired by electric circuit models is used. Read more here (opens new window).
Effective Size: Effective Size is related to the concept of structural holes and the redundancy of connections. It measures the number of people that the node is connected to but controlling for (i.e. reducing by) the redundancy of those connections. Read more here (opens new window).
Efficiency: Efficiency is equal to effective size divided by total degree. If a node has no redundant ties then the effective size will be equal to total degree and efficiency will be equal to one. Efficiency is the proportion of a node’s ties that are non-redundant. Read more here (opens new window).
Eigenvector Centrality: Eigenvector Centrality is motivated by the idea that nodes connected to other nodes that are central should themselves be relatively central, i.e. being connected to a central node contributes more than being connected to a non-central node. It is not always well defined for directed networks and it’s generally preferable to calculate Katz Centrality for directed networks. However, should you calculate eigenvector centrality for a directed network, Polinode will return “left” eigenvector centrality (i.e. corresponding to the in-edges). Read more here (opens new window).
External vs Internal: External vs Internal (EI) calculates, for a given attribute, the percentage of a node’s edges that connect to nodes that do not share the same value for that attribute (external connections) vs connections to nodes that do share the same attribute value (internal connections). If type is Total the metric will be calculated for all edges, if type is In then the metric will be calculated only for a node’s incoming edges, and if type is Out then only for a node’s outgoing edges.
Harmonic Centrality: Harmonic Centrality for a node is the sum of the reciprocals of the shortest path distances from that node to each other node in the network. It is closely related to Closeness Centrality with the key difference being that the reciprocal is taken for each distance rather than taking the reciprocal of the sums of the distances. Read more here (opens new window).
HITS: Hyperlink-Induced Topic Search (HITS) for a node gives two metrics – Hubs and Authorities. A node has a relatively high Hubs score if it links to other nodes and a relatively high Authority score if it is linked to by other nodes.
Identify Influencers: Identify Influencer is a heuristic that finds the most influential nodes in the network in the sense that together the count of those nodes and the nodes connected to those nodes is maximized, i.e. coverage of the network is maximized. Read more here. It is also possible to limit the influencers identified to certain attribute values by using the Limit by Attribute option. This is helpful if, for example, you want to identify influencers in an organization but only at the individual contributor level.
In Degree: In Degree for a node is a straightforward measure of centrality – it measures the total number of nodes linking to that node.
K Clique Communities: A K Clique Community is the union of all cliques of size k that can be reached through adjacent k-cliques where a k-clique is a group of k nodes that are all connected to each other and a k-clique is said to be adjacent to another k-clique if it shares k-1 nodes with it. Communities produced by this algorithm are generally not distinct and will overlap so an attribute is added for each community found. Read more here.
Katz Centrality: Katz Centrality for a node takes into account not just the neighbors of that node but also their neighbors and so on, applying an attenuation factor of alpha so that the influence of nodes declines on every step away from the target node. Read more here.
Load Centrality: Load Centrality for a node is the total amount of some commodity passing through that node when one unit of the commodity is sent from each node in the network to each other node in the network and the commodity is split equally at branching points and aggregated at meeting points.. Load Centrality is very similar to Betweenness Centrality and also measures ‘bridging’. Read more here (opens new window).
Louvain Communities: Louvain Communities are non-overlapping groups of relatively closely connected nodes found by an optimization algorithm. Read more here (opens new window).
Out Degree: Out Degree for a node is a straightforward measure of centrality – it measures the total number of nodes that that node links to. Read more here (opens new window).
Pagerank: Pagerank for a node is a ranking of relative importance in the network based on the structure of incoming edges for that node. It was originally designed to rank web pages. Similar to Katz Centrality, alpha is an attenuation factor. Pagerank for undirected networks will be calculated by transforming each undirected edge into two directed edges. Read more here (opens new window).
Total Degree: Total Degree for a node is a straightforward measure of centrality. It is simply the total number of edges that that node has, i.e. for directed networks it is the sum of In Degree and Out Degree. Read more here (opens new window).
Some of these metrics are only available for directed networks and some are only available for undirected networks. If a metric is not available for your network you will see a message to that effect to the right of the dialogue.
Edge Betweenness: Edge betweenness measures the total number of shortest paths in the network that pass through an edge relative to the total number of shortest paths in the network overall. Just as for Node Betweenness, you can select a number of nodes to sample as a percentage of the total nodes in the network. If Apply Edge Weights is set to Yes then the inverse of the edge weights will be used such that a larger edge weight effectively reduces the distance between two nodes rather than increasing it.
This layout algorithm simulates physical forces on the network. You can think of it as applying an attractive force between nodes that are connected by an edge and simultaneously applying a repulsive force between all pairs of nodes. It is a continuous algorithm that will reach an equilibrium when these forces are in balance and the nodes stop moving. It needs to be started and stopped manually by clicking the start and stop buttons.
There is also one advanced setting for it and that is the Prevent Overlap option. By default, this option is No and you should always start running the layout algorithm with prevent overlap on No. Once the nodes have reached equilibrium you may want to “tidy up” the layout by running the force-directed layout again with Prevent Overlap set to Yes. The same physical forces will be simulated but in this case, the relative size of nodes will be taken into account so that overlapping nodes are repelled from each other. It is significantly slower to run the algorithm with Prevent Overlap set to Yes which is why it should only be applied after an initial equilibrium has been reached.