Wednesday, October 23, 2013

From the Field: Is Your Research Powerful?

We're lucky enough to have this "From the Field" post from Paul Visintainer, PhD, Director of Epidemiology and Biostatistics here in Academic Affairs at Baystate. Grab a pencil, a notebook, and your most mathematically-oriented friend and read this post as your own personal lesson in error, statistical power, and sample size. Because - of course - even educational research requires a power analysis!  

Statistics Review 4: Sample Size Calculations. Whitley E, Ball J. Critical Care. 2002; 6: 335-341. Available online from the Baystate Health Sciences Library, or from PubMed at your institution. 

I am always pleased when I see articles expounding the importance of sample size and statistical power in clinical research – particularly when they appear in clinical journals.  I was happy to see the Whitley and Ball article in Critical Care.  Although the article is several years old, the issues discussed are still – and will continue to be – relevant.  I just want to reinforce and expand on a couple of the concepts.

First, among some disciplines, there exists an idea – a myth, really – that sample size and statistical power are only relevant for randomized clinical trials.  I don’t know where this idea got started or how it continues, but it is completely untrue.  Any time a researcher wishes to describe a comparison with a p-value, he must also address statistical power.  This applies to all study types – randomized controlled trials, cohort studies, case-control studies, retrospective chart reviews (this last is actually not a study type, but rather a method of data collection), etc.  If the study uses a data analytic approach that generates a p-value, then statistical power should be addressed. 

The above issue becomes clear when one considers the reasoning underlying statistical analysis and sample size.  Suppose a researcher posits a question about some clinical effect -- e.g., Doesan exposure cause disease? Does the treatment reduce morbidity?  Does a medication reduce pain?  Since he doesn’t know the correct answer, he designs a study to answer the question.  (If he knew the right answer, he wouldn’t have to conduct the study, right?)  He wants to make a “generalizable” statement, (e.g., the effect is generally real for all patients).  However, he isn’t studying all patients – he is only studying a sample of patients.   

So, his conclusion will have some error because he is basing his results on only one sample.  In designing his study (before he collects any data), there are two possible errors he could make in his conclusion. 
A) He could conclude that there is an effect in his sample, when one truly doesn’t exist in the population.  This is the α-error (“alpha” or Type 1) and this is what the p-value reflects. 
B) He could conclude from his sample that there is no effect, when one truly exists in population.  This is the β-error (“beta” or Type 2) and this is what power reflects – actually, 1-β = power.

It is important to note that these errors are “conditional”.  That is, α-error is the probability of rejecting the null hypothesis if the null hypothesis is true. β-error is the probability of NOT finding an effect if the null hypothesis is false.  At the start of the study, the investigator doesn’t know the true status of the null hypothesis, so BOTH errors have to be addressed.  Among other things, this is precisely what a sample size calculation does.  It incorporates estimates of both errors into the computation of sample size.

Once the study is conducted, only one state for the null hypothesis will exist in the sample:
A)  If the results reject the null hypothesis with a p < 0.05,  then the investigator either has found a true effect or has a Type 1 error
B)  If the results fail to reject the null hypothesis, then the investigator has either demonstrated that there is truly no effect or he has a Type 2 error.

Notice that because the results are based on a sample and not the population, there will always be some error in the interpretation, regardless of what the p-value is.  Notice also that nowhere in this discussion does the approach apply only to an RCT type study.  The errors and uncertainty facing the investigator are present regardless of study type

The second point about statistical power discussed by the authors is the effect size.  I think this is a fundamental issue because when sample size and statistical power are addressed, the investigator establishes the clinical context within which study results will be interpreted.  How does he do that with sample size?  Consider the factors that go into computing a sample size:  type 1 error, type 2 error, some measure of variability (e.g., variance or standard deviation) and the specified difference in the groups or the treatment/exposure effect. Once these four factors are specified, the sample size can be computed.   

It is the last component – the specified difference in the groups or treatment/exposure effect – where the investigator defines what is clinically important.  After all, isn’t that the goal of clinical research?  To comment on clinically important effects?  It has been the eternal frustration of statisticians to have an investigator ask, “How many patients do I need to find a statistically significant result?”  Asking a question in this manner indicates that the investigator has not considered his study within a clinical context.  Similarly, a protocol that proposes a review 100 charts without any sample size justification, essentially suffers the same limitation – the investigator has not begun to think of his study (or the subsequent results) within a clinical context.

Luckily, at Baystate there are people in Academic Affairs that can help clinicians work through the issues of sample size and statistical power.  A discussion with a statistician is a great way to address sample size.  There are a lot of options and configurations to consider and clinicians should be aware how these options may affect your study. 


P.S.  (To answer the question, “How many charts should I review to find statistical significance?” Well, if clinical relevance is . . . irrelevant . . . , then I think it is safe to say if one reviews 10,000 charts, something will turn up statistically significant.  Better yet, review 20,000 charts just to be sure.)

Bottom Line:

Ask a statistician. 

No comments:

Post a Comment