larlet-fr-david-cache

A place to cache linked articles (think custom and personal wayback machine)

title: This is what a graph of 8,000 fake Twitter accounts looks like url: https://shkspr.mobi/blog/2015/03/this-is-what-a-graph-of-8000-fake-twitter-accounts-looks-like/ hash_url: 7e0da5e775

Recently I've been plagued with Tweets saying that I'm "trending in London."

As flattering as that is, it's not true. There appears to be a network of Twitter bots which are randomly repeating other people's tweets, ripping off avatars and bios, and generally causing a nuisance.

Looking at the users' Twitter name, I don't think it's unreasonable to think that "ekip_uhokoqeq" and "utadaqusoxeh" are randomly generated sequence of characters. And, without wishing to judge, that photo doesn't look like a Susan...

Let's take a look at one of the user's profile :

It's possible to trace back the bio and photo to different users - they've had their details misappropriated. The Tweets seem to be just randomly taken from other users.

Let's take a look at who this bot is following :

Again, random sequences of letters, hijacked bios and photos. Clicking through each of these profiles reveals a network of thousands of fake accounts.

I adapted a script to visualise the network of accounts - this is only looking 3 levels deep (Twitter's API limits make going much further a time consuming task) :

I used Gephi to draw the graph.

As you can see - there are dozens of randomly named accounts. They appear to only be following each other - I don't think any "real" users are in there. I can only assume that by forming a network like this, they can evade Twitter's filters. The bots can then either start generating spam, be sold off as fake followers, or used for some other unsavoury purpose.

I ran the script over the weekend to a recursive depth of 4 and identified over 8,000 spam accounts. Using Cytoscape and Allegro Layout I was able to create this visualisation of the tangled web of connections.

Using Python's Graph-Tool I generated a somewhat prettier visualisation of how all these accounts are connected.

They each mostly follow around 8 accounts - there's a fair bit of clustering. While there are a few accounts with larger follower numbers, it's hard to discern if there's a definitive pattern. The large gap appears to be users who have been suspended.

As the weekend drew to a close, I'd reached the fifth level of my recursive algorithm. Using Cytoscape's "Organic" layout, another interesting pattern emerges.

There appears to be several "loops" - that is bots which are in an almost closed network with each other. I see at least half a dozen circles - the rest appear to be following other fake accounts at random.

The centre of those circles appear to be real people. I can't say why they have lots of fake followers - it's possible that they - or someone else - has just bought them to make it look like they're more popular than they really are. There's no suggestion that they control the fake accounts.

One of the central nodes has 650,000 followers. It's not possible to know quite how many of those are fake - I'm guessing the majority are.

It seems that there's a nasty nest of these bots. In the last few weeks I've reported a dozen or so for spam - but with literately tens of thousands in the network it's impossible for any individual to make a meaningful impact.

I wish Twitter could track down the source of this problem and eradicate it.

If you want to have a play with this dataset - you can download a .zip file of the relationships and their metadata.

index.md 6.1KB Raw Blame History

index.md 6.1KB

Raw Blame History