Vape, Quit, Tweet: E-Cigarettes and Smoking Cessation, according to Twitter

Last year in June, in the esteemed Tobacco Control journal there was an article talking about promoting vaping and smoking using social media with the inevitable conclusion: “Future studies should examine the extent to which Twitter users, particularly youth, notice or engage with these price promotion tweets.” Natch.

So, fast forward six months and there’s another one. Based on some of the methodology from the study last year, these researchers decided to look at an entire calendar year via a third party company Sysomos – “Proactive Social Media Monitoring” – it’s true, they really are out to get us. Sysomos is a subscription based service that allows the subscriber to analyse their social media reach – they call it a “Social Intelligence Platform”. Unlike the previous study that used the Twitter Hosepipe API, this one went a little further. According to the Heartbeat solution page on the Sysomos website:

Access billions of conversations spanning 189 countries and 186 languages with up to two years historical data. No matter where a conversation is happening on Twitter, Instagram, blogs, or traditional media (just to name a few) if it is on the social web, Heartbeat will surface it for you.

That’s more than a bit creepy.

Tease the signal from the noise. Query the billions of conversations that Sysomos collects and store just the data that matters to you. Then tag conversations to measure, compare, and analyze the results.

Which is exactly what these researchers did, and here’s how.

They wanted a “purposive sample of tweets” from users with the “highest authority”; as in those users with a large amount of social reach. As we know there are several accounts that have an awfully large following, this (according to the researchers) is valid methodology as the “high authority” users make up the most popular 5% of Twitter users, that (again according to the research) generate up to 75% of activity.

We employed this strategy to sample tweets from the most authoritative Twitter users communicating about e-cigarettes and smoking cessation.

The researchers searched the Heartbeat database for English language tweets, but did not differentiate between tweets, retweets and replies, and they used a fairly extensive selection of keywords and hashtags to facilitate this search, for example:



Of course, no search would be complete without some words being excluded; almost all were related to marijuana.

Our search identified 2,543,726 tweets about e-cigarettes posted in 2014

Stepping outside the study for a moment, 2.5 million tweets about e-cigarettes in the year of 2014 alone. That is a staggeringly large number, though I suspect it pales in comparison to other topics. These 2.5 million tweets were then filtered to only show tweets/replies/retweets with variations of the smoking cessation keywords to give a primary dataset of 120,803 tweets (from 80,809 users), less than 4% of the total tweets identified about e-cigarettes were about e-cigarettes along with a variation of the smoking cessation keywords.

Some staggering figures, but let’s step back for a moment. Out of the 2.5 million tweets on e-cigarettes for the year of 2014, approximately 4.7 of them included, or references smoking, quitting and/or cessation. Making an assumption (of which there is absolutely no basis in fact here), that 4% of tweets could theoretically come from 4% of users – 3,837 of them. Out of 80,809. But here’s the thing, a vast proportion of users talking about e-cigarettes on twitter rarely use hashtags, unless the user actually wants the tweet to be picked up and retweeted.

Instagram and Twitter have one very big thing in common – the hashtag. On twitter, it is “recommended” that up to three hashtags be used to “be noticed” while on Instagram, the more hashtags you included the wider the audience that will see your photo is likely to be. Twitter hashtags on the other hand can be used as a sort of “context filter” to only show tweets with that tag (#ecigs is an example tag that I have set up in Tweetdeck, as are #ecigarettes, #vaping, #F1, and #Fallout).

These allow me to see tweets from folks that I don’t necessarily follow but have similar interests to me, from there I can decide to follow that user or not. Mostly hashtags are used by companies or conferences (#ecigarettesummit is a prime example of that) so interested folks can discuss products/proceedings without necessarily following the people involved. It is actually rather clever.

TweetSelectionThe researchers also needed to filter the primary dataset even further, let’s face it 120,803 tweets is a lot of data to sift through looking for patterns. So they applied other filters specifically related to “industry” communications. Things like #promo or #win apparently suggest commercial activity, with the tweets in each dataset subsequently ranked.

This ranking was applied by the proprietary algorithm used by Sysomos to determine a “measure of authority” based on an analysis of the user’s follower count, how many accounts the user follows (the following to follower ratio has been used a number of times by certain individuals as “proof” of astroturfing or shill), the number of tweets & retweets. From that information the Sysomos algorithm gives the tweet a score out of 10 with 10 being the “highest” authority.

Any commercial tweets that were missed by the filters were manually removed, and other tweets bought in to give two datasets of 300 tweets each for the two categories – Complete dataset and an “Industry-free” dataset.

So far, so good. Typical filtering and ranking, although the ranking is based on a proprietary algorithm which is often used to aid in social marketing so there is likely to be some influence from that aspect.

Amusingly, the coding of the tweets ran into a few….technical difficulties. The Heartbeat “sentiment” analysis was judged to be unreliable so all sentiment was manually coded. The thing with manual coding is that some random words in a tweet that have survived the filtering process can very well be interpreted completely differently by a human than they can by an automated process. Apparently these tweets enabled the researchers to determine a message’s feeling/emotion or affective content and its attitude.

This is where it gets a little worrisome. Each tweet (that’s 600 in total remember) was classified according to a user type (based on a study conducted by Paek et al on harm reduction via e-cigarette videos on YouTube), but the researchers state:

 which typically involved a detailed inspection of the associated Twitter account.

So not only have they snagged 600 tweets, they also decided to perform a “detailed inspection” of the associated Twitter account. Perhaps they’d better check those accounts again:

H/T @anImaginaryEcho

So what did they find from these accounts?

Tweets from the “complete” dataset were generated by 148 unique users – so almost half of the 300 tweets were “original” tweets, the rest were retweets or replies. In contrast, the “industry-free” sample contained 215 unique users. That suggests to me that out of the “industry-free” dataset, 85 of the tweets they looked at were replies or retweets. Furthermore, the researchers combined both datasets to come up with this:


Here’s where it gets a little more interesting.

In addition, the categories industry/related, fake users, and personal with industry ties were collapsed into a single group labeled industry ties as each was deemed commercial in nature, while public health and healthcare were collapsed into a single group labeled health.

So business accounts, fake users and personal accounts with an industry tie (such as employed by a vape shop) were all bundled up into one category. All based on a Chi-square analysis to test associations between the user type, message affective content and attitudes towards e-cigarettes as a cessation aid. They also removed 119 overlapping tweets that were present in both datasets. Unsurprisingly, bundling up business, fake accounts (which only exist to RT everything in sight) and users with the most tenuous of “ties” show a high percentage of positive attitude and affective content.

Table3Each of the remaining 481 tweets were then coded for a “message theme” and based on that, and the combination of industry, fake accounts and tenuous ties to the industry, the researchers suggest that marketing is the top message theme from all 481 tweets. Followed by news articles. Each tweet would have been tagged with one or more of these “message theme” codes so the same tweet could be present in multiple theme codes.


Based on their analysis of the two datasets, a large proportion of the tweets carried positive sentiments and attitudes towards the use of e-cigarettes, and unsurprisingly almost half of them originate from “industry/related”, but in this instance the researchers have decided to categorise each tweet according to the appropriate user type. Interestingly, 10% of the users identified are fake accounts and the researchers go into a lot of detail about these accounts in the supplementary material associated with the study. Amusingly in the supplementary material they discuss @AngryCBrown who (at the time of the study) apparently had posted over 24,000 tweets and had nearly 60K followers.


Not according to a quick check of the account today. They also clearly don’t understand how Twitter works as shown in the supplementary material when talking about @KatieHeigl.

Although Heigl’s tweet was only retweeted four times, it went out to all 988,541 of her followers.

Actually, no it wouldn’t have done. See there’s a quirk with the @mention on Twitter. If I @mention someone, only followers of both users will see the tweet. So if tweet @katieheigl (unlikely but you never know), only those that follow me AND her will see the tweet, and as I don’t know if any of my followers follows @katieheigl it’s pretty pointless. However, if I stick a full stop before the @mention – .@mention – it does something completely different. The tweet will appear in all of my followers feeds, not just those that follow me AND @katieheigl.

So far, all this filtering and analysis hasn’t turned up anything that we don’t already know, so what is the fuss about?

 By providing an additional industry-free picture, we showed that even users who avoid commercial content are exposed to overwhelmingly favorable views which may lack current scientific support.

Ah so it’s all about exposure to a product that many view as “a bad thing”, funny thing is throughout the analysis explicit claims about cessation were rare, but (as with any and all research) “we identified several strategies for circumventing legislation against such claims”.

Once again researchers are failing to understand key points:

  • Subject matter (i.e. e-cigarettes)
  • How Twitter works
  • Twitter trends (how many other industries or even Federal bodies co-opt a hashtag for their own agenda?)

Manual coding of tweet and user types is open to broad interpretation and of course selection bias.

Studying social media is open to all kinds of interpretation, it’s difficult to understand the motivations behind the text being displayed in tweets. Plus, the focus of “authoritative” users only (the top 5% of users) diminishes the value of this research. Picking up hashtags only, when most twitter-conversations happen without them is a ridiculous notion. Twitter can be searched by name, hashtag or even a simple key-word.

Too many failings to even give this study much more thought, except to maybe think about protecting my tweets from now on.