Statistical sleight of hand

July 06, 2015 (Last Modified: July 06, 2015)

Another day and yet another “study” that has been totally skewed to the rafters in order to push a regulatory agenda. It really can be depressing, what makes it even worse is when organisations that state “Our vision is to eliminate the use of nicotine and tobacco products by youth and young adults.” get it so badly wrong. You could be forgiven for thinking that the stats were only slightly misinterpreted, but no. These wonderful folks deliberately co-joined two completely different sets of statistics to get the results they were looking for.

Twitter

Scary tweet that isn’t it? “New research! Texas youth using e-cigs at higher rates: 19.1% (HS students) & 7.9% (MS students) in the past 30 days.” Shameful thing from their point of view is that it isn’t strictly true. Here’s the really crazy thing about this kind of “research”, it’s all about how you ask the question. For example:

A typical question

So they ask the survey participant, in the past 30 days which of the following have you used on at least one day. There’s two flaws with that I can see immediately. The first of which is the time-frame. The “past 30 days” can be translated to “the past month” or the “past calendar month”.

What if the participant used one of these products exactly 30 days ago and has no intention of doing so again?

What if the participant used one of these products the day before but had no intention of doing so again? Or even somewhere in the middle?

The immediate follow-up question to add some granularity to this data could either be more time-frame specific, such as “how long ago” and specify 1-7 days, 1-2 weeks, 2-3 weeks, 3-4 weeks and so on, with the second follow-up being “would you use the product again in the future?”

Two very simple follow-up questions which would add a whole lot more data to analyse and segment then just asking this one question, but it would give a much better understanding of the intentions of the participant. Of course, intentions are wide open for a variety of interpretations in a written report, but as long as the raw data is there in black and white there’s very little wiggle room for misinterpretation as is often the case with surveys of this nature. Instead the next question on the Youth Tobacco Survey is this: “How easy would it be for you to get tobacco products if you wanted some?”. Oh dear. There are a whole raft of questions they should be asking in relation to vapour products, yet they are oblivious to the nuances of the market and it’s users.

Ever tried

The question before that is the one we should really worry about. “Which of the following tobacco products (oh FFS!) have you ever tried, even just one time?”.

Look, I know they combine anything derived from tobacco as a tobacco product (except NRT, go figure!), and the vague “some other new tobacco products not listed here” leaves the data gained from this question practically meaningless. It isn’t totally worthless as it provides some baseline statistics to work with, as long as Mr Statistician doesn’t confound the information.

But here’s the thing if I took this “survey”, which I’m not going to by the way, I’d have to answer that particular question with more than one answer. Yep, A, B, E and of course I. Might even include J in there just for shits n’ giggles. One person, one question and five answers. Stick that in your pipe and smoke it.

Now, those two questions together are seemingly enough for any CDC data analyst to look at and say with absolute certainty that someone who has used an e-cigarette once in the last 30 days is a current user. Oh yes. Do they not see the fatal flaw in these questions yet? No? How about asking how often they have actually used one? I mentioned a similar question earlier in the post as a follow-up question, but it seems they don’t care for that information, yet they do care about it when it relates to actual tobacco smoking.

How many days did you smoke cigarettes?

Seriously, how hard is it to adapt that exact question for other tobacco products and e-cigarettes? It would definitely seem that these folks don’t actually want accurate information, or if they do they confound it with other meaningless information. Trouble is, a study has already proven that the current method of asking about e-cigarette use is flawed:

Frequency of EC use among current and former smokers

It is remarkable just how fluffy the data is when you don’t ask the right questions isn’t it? It can lead to all sorts of embarrassments, after all they are so detailed when they want to know about your smoking habits yet they don’t seem to give a flying rat’s ass when it comes to vapour product use. Yet this is exactly the kind of thing they try to get away with:

The item regarding current e-cigarette use asked, “During the past 30 days, on how many days did you use electronic cigarettes or e-cigarettes, such as Ruyan or NJOY?” Current e-cigarette use was operationalized using a dummy variable; students reporting at least one day of e-cigarette use in the past 30 days were classified as current users, while those reporting zero days of e-cigarette use in the past 30 days were classified as non-current users (0 = noncurrent users, 1 = current e-cigarette users). Non-current users were individuals who had not used e-cigarettes in the past 30 days and included students who had ever tried e-cigarettes.

Let’s take a moment to digest this. This is the question 37 I looked at earlier in the post, and the quoted text above is the exact wording from the study mentioned in the TexasTCORS tweet.

These “researchers”, and I use that term loosely here, classified users as:

Current e-cigarette use defined as “at least one day of e-cigarette use in the past 30 days”
Non-Current users defined as “reporting zero days of e-cigarette use in the past 30 days”
In addition “non-current users” were individuals who had not used e-cigarettes in the past 30 days and included students who had ever tried e-cigarettes

Do you see the massive issue there? To make “non-current” use seem larger in comparison to “current users” they have done a bit of statistical sleight of hand. By admitting I have used an e-cigarette in the past 30 days I immediately become a “current user”. BUT, if I had not used my device for 31 days I’m all of a sudden classed as a “non-current user” which in their “lifetime” terms means I am actually an e-cigarette user.

By confounding those two statistics, it is easy to see why these fools have reached such figures, yet the Federal Government doesn’t seem to actually understand that the information they are basing their propaganda on is fundamentally flawed. If I, as a normal everyday chap with zero statistical know-how or scientific knowledge (as in I don’t have a string of letters after my name) can see it, then how in Seven Hells can those with multiple letters after their name not see it?