Wednesday, October 23, 2013

From the Field: Is Your Research Powerful?

We're lucky enough to have this "From the Field" post from Paul Visintainer, PhD, Director of Epidemiology and Biostatistics here in Academic Affairs at Baystate. Grab a pencil, a notebook, and your most mathematically-oriented friend and read this post as your own personal lesson in error, statistical power, and sample size. Because - of course - even educational research requires a power analysis!  

Statistics Review 4: Sample Size Calculations. Whitley E, Ball J. Critical Care. 2002; 6: 335-341. Available online from the Baystate Health Sciences Library, or from PubMed at your institution. 

I am always pleased when I see articles expounding the importance of sample size and statistical power in clinical research – particularly when they appear in clinical journals.  I was happy to see the Whitley and Ball article in Critical Care.  Although the article is several years old, the issues discussed are still – and will continue to be – relevant.  I just want to reinforce and expand on a couple of the concepts.

First, among some disciplines, there exists an idea – a myth, really – that sample size and statistical power are only relevant for randomized clinical trials.  I don’t know where this idea got started or how it continues, but it is completely untrue.  Any time a researcher wishes to describe a comparison with a p-value, he must also address statistical power.  This applies to all study types – randomized controlled trials, cohort studies, case-control studies, retrospective chart reviews (this last is actually not a study type, but rather a method of data collection), etc.  If the study uses a data analytic approach that generates a p-value, then statistical power should be addressed. 

The above issue becomes clear when one considers the reasoning underlying statistical analysis and sample size.  Suppose a researcher posits a question about some clinical effect -- e.g., Doesan exposure cause disease? Does the treatment reduce morbidity?  Does a medication reduce pain?  Since he doesn’t know the correct answer, he designs a study to answer the question.  (If he knew the right answer, he wouldn’t have to conduct the study, right?)  He wants to make a “generalizable” statement, (e.g., the effect is generally real for all patients).  However, he isn’t studying all patients – he is only studying a sample of patients.   

So, his conclusion will have some error because he is basing his results on only one sample.  In designing his study (before he collects any data), there are two possible errors he could make in his conclusion. 
A) He could conclude that there is an effect in his sample, when one truly doesn’t exist in the population.  This is the α-error (“alpha” or Type 1) and this is what the p-value reflects. 
B) He could conclude from his sample that there is no effect, when one truly exists in population.  This is the β-error (“beta” or Type 2) and this is what power reflects – actually, 1-β = power.

It is important to note that these errors are “conditional”.  That is, α-error is the probability of rejecting the null hypothesis if the null hypothesis is true. β-error is the probability of NOT finding an effect if the null hypothesis is false.  At the start of the study, the investigator doesn’t know the true status of the null hypothesis, so BOTH errors have to be addressed.  Among other things, this is precisely what a sample size calculation does.  It incorporates estimates of both errors into the computation of sample size.

Once the study is conducted, only one state for the null hypothesis will exist in the sample:
A)  If the results reject the null hypothesis with a p < 0.05,  then the investigator either has found a true effect or has a Type 1 error
B)  If the results fail to reject the null hypothesis, then the investigator has either demonstrated that there is truly no effect or he has a Type 2 error.

Notice that because the results are based on a sample and not the population, there will always be some error in the interpretation, regardless of what the p-value is.  Notice also that nowhere in this discussion does the approach apply only to an RCT type study.  The errors and uncertainty facing the investigator are present regardless of study type

The second point about statistical power discussed by the authors is the effect size.  I think this is a fundamental issue because when sample size and statistical power are addressed, the investigator establishes the clinical context within which study results will be interpreted.  How does he do that with sample size?  Consider the factors that go into computing a sample size:  type 1 error, type 2 error, some measure of variability (e.g., variance or standard deviation) and the specified difference in the groups or the treatment/exposure effect. Once these four factors are specified, the sample size can be computed.   

It is the last component – the specified difference in the groups or treatment/exposure effect – where the investigator defines what is clinically important.  After all, isn’t that the goal of clinical research?  To comment on clinically important effects?  It has been the eternal frustration of statisticians to have an investigator ask, “How many patients do I need to find a statistically significant result?”  Asking a question in this manner indicates that the investigator has not considered his study within a clinical context.  Similarly, a protocol that proposes a review 100 charts without any sample size justification, essentially suffers the same limitation – the investigator has not begun to think of his study (or the subsequent results) within a clinical context.

Luckily, at Baystate there are people in Academic Affairs that can help clinicians work through the issues of sample size and statistical power.  A discussion with a statistician is a great way to address sample size.  There are a lot of options and configurations to consider and clinicians should be aware how these options may affect your study. 


P.S.  (To answer the question, “How many charts should I review to find statistical significance?” Well, if clinical relevance is . . . irrelevant . . . , then I think it is safe to say if one reviews 10,000 charts, something will turn up statistically significant.  Better yet, review 20,000 charts just to be sure.)

Bottom Line:

Ask a statistician. 

IPE and VTE: An Educator's Portion of Alphabet Soup

Reduction of venous thromboembolism (VTE) in hospitalized patients: aligning continuing education with interprofessional team-based quality improvement in an academic medical center.  Pingleton SK, Carlton E, Wilkinson S, Beasley J, King T, Wittkopp C, Moncure M, Williamson T. Acad Med. 2013; 88(10):1454-1459. Available online from the Baystate Health Sciences Library, or from PubMed at your institution. 

I am an educator. Aside from observing some of my clinical colleagues, my only real clinical experience involves helping to restrain my young sons for flu shots. However, when my colleague suggested that this article - about an interprofessional effort to decrease incidence of venous thromboembolism (VTE) - is a view of how clinicians might think of interprofessional education, I rolled up my sleeves and prepared to muscle through clinical-ease to find the nuggets of educational insight.

To my delight, the clinical world's fondness for acronyms has once again eased the burden of a taxing vocabulary, as we read in this article about KU's VTE data, the KUH intranet, PICCs, and BPAs. Served up with this alphabet soup is the real gem of this article - the planning matrix, described as a way of mapping the "types of interventions ... on the learners' stages of acceptance..." Take a look at Table 1, and you'll that this is a very neat and souped-up way of saying they took time to design a curriculum. This thoughtful approach was also evident in how they treated their interprofessional group of learners - by considering the breakdown of responsibilities in decreasing VTE incidence. 

The article lacks in a basic way; their outcome is a view of the decrease in VTE incidence. Clinical education is designed to improve patient outcomes by changing provider behaviors which then change their approach to patient care. As educators, we should be measuring the extent to which our efforts change behavior as well as the change in patient outcomes. This offers a clearer picture of the link between educational efforts and VTE incidence. 

Overall, this article is a good view of the way that educational efforts - particularly interprofessional ones - are being designed to improve patient outcomes. The secret of their success? Having a clearly defined problem and a thoughtful, interprofessional curriculum designed to fix it. Now that's a recipe you'll want to steal. 

Bottom Line:

IPE is used to decrease VTE - view this as a window into the link between educators and patients. For a fun activity, apply the points from Kanter's editorial outlining a better process for writing about innovations onto this article to see how the authors' successfully put the spotlight on the problem before their innovative solution takes center stage.