Wednesday, December 11, 2013

From the Field: Can You Write an Effective Questionnaire? A. Yes, B. Always, C. Read this Post!

Check out this post "From the Field" by Anthony Artino, Jr, PhD, - Associate Professor at Uniformed Services University of the Health Sciences in Bethesda, survey design connoisseur, and, most recently, guest blogger! Apply Dr. Artino's points to research, quality improvement, or your everyday opinion survey. Dig in!

You can't fix by analysis what you've spoiled by design. Rickards G, Magee C, Artino AR. JGME. 2012; 4(4): 407-410. Available online from the Baystate Health Sciences Library and at your institution. 

Tracing the steps of survey design: A graduate medical education research example. Magee C, Rickards G, Byars LA, Artino AR. JGME. 2013; 5(1): 1-5. Available online from the Baystate Health Sciences Library and at your institution. 

What do our respondents think we're asking? Using cognitive interviewing to improve medical education surveys.  Willis GB, Artino AR. JGME. 2013; 5(3): 353-356. Available online from the Baystate Health Sciences Library and at your institution. 

If you’re anything like me, you’ve completed more questionnaires than you care to count. Whether it’s an end-of-course evaluation of a workshop you attended or a satisfaction survey from a recent visit to the clinic, questionnaires are ubiquitous in education and health care. There’s one problem though, and it’s a problem you’ve surely encountered – many questionnaires are poorly designed and often times they fail to capture the very thing they’re attempting to measure. Some common problems with questionnaires include confusing or biased language, baffling visual layout and design, and unclear instructions. Unfortunately, in the age of email, the Internet, and online tools such as SurveyMonkey, the number of survey requests grows exponentially with each passing day. 

Despite the plethora of bad questionnaires that exist in education and health care, there is a wealth of evidence-based knowledge regarding the “best practices” in survey design. Much of this knowledge is detailed in the highlighted articles and briefly summarized below as three principles:

1. You can’t fix by analysis what you’ve spoiled by design.  Even though this principle is true for all types of research and evaluation, it is especially true in questionnaire design for one simple reason – when creating a survey, we’re often trying to assess things that are traditionally hard to measure (so-called “fuzzy” or non-observable constructs). These fuzzy constructs include things like student anxiety, resident confidence, and faculty job satisfaction. As such, it’s critically important that survey designers take the time to carefully design and pretest their questionnaires prior to implementation.

One way to pretest a questionnaire is to have a group of experts review the items and then have a group of potential respondents complete the survey while you observe. Having experts review your draft ensures, among other things, that the content of your survey is relevant and clear; whereas having potential respondents complete your survey verifies that the way they interpret your items aligns with what you had in mind when you designed the questionnaire.

2. The questions guide the answers. People often underestimate the degree to which the precise wording of a question plays a critical role in determining the answers provided by respondents. Take, for example, the following two questions about health insurance: “Are you fairly treated by your health insurance company?” versus “Does your health insurance company resort to deception in order to cheat you of covered benefits?” These two questions would likely elicit very different responses, and it probably wouldn’t surprise you to find that an advocate for health insurance reform asked the second question. Clearly, words like “deception” and “cheat” are strong indications that the author of the question doesn’t have a high opinion of the health insurance industry. Thus, as this principle implies, the wording of a question largely determines the answers people provide.

And while this principle is true in everyday life, when it comes to questionnaires, the effect is even more pronounced; most surveys don’t give respondents the chance to provide feedback about misunderstandings and ambiguities. Therefore, when it comes to questionnaire design, small wording changes can often make big differences, which is another reason to pretest your survey before sending it out to 3,000 respondents.

3. Think of it as a conversation. At the end of the day, a questionnaire is really just a conversation between you (the skillful survey designer) and your respondents. As such, you should consider the implicit assumptions that underlie the conduct of conversations in everyday life. These conversational “rules” include the idea that speakers should try to be informative, truthful, relevant, and clear. If you break these rules as a survey designer, you shouldn’t be surprised (or upset) if your respondents, in turn, provide you with poor-quality answers.

An important implication of this principle is that you should ask questions when you want to learn something from your respondents, as opposed to asking them to agree or disagree with a list of statements. Asking people to rate a bunch of statements is not very conversational – when’s the last time you went up to a friend in the hallway and asked her to “rate the following statements on a scale of 1 to 10”? At the end of the day, people are more familiar and adept at answering questions – not rating statements – so, as an informed survey designer, you should ask well-thought-out questions and pretest those questions on experts and potential respondents prior to implementing your survey.
  
Notwithstanding the temptation to think of survey design as “more art than science,” there’s actually quite a bit of scientific evidence to guide you through the survey design process. Following these evidence-based best practices will not only save you time and effort during data analysis and interpretation, but they will also improve the chances that your survey will actually measure what you intend it to measure.

Bottom Line:

A. Put some effort and thought into the development of your questionnaire and pretest your survey items before implementation. 
B. Use these articles as a way to develop good practice. 
C. All of the above!

Monday, November 4, 2013

Tokenistic or Authentic: What exactly do you mean by "Let's collaborate"?

Patients as educators: Interprofessional learning for patient-centered care.  Towle A & Godolphin W. Med Teach. 2013; 35:219-225. Available online from the Baystate Health Sciences Library, or from PubMed at your institution. 

I write this post from the coffee station at the Association of American Medical Colleges (AAMC) annual meeting. The coffee is being refilled after spending so many days fueling the conversations of leaders, followers, educators, contributors, and stakeholders in American medical education. No doubt this coffee fuels the exchange of many introductions and handshakes, business cards, and emphatic opportunistic collaborations. 

Against that backdrop is the perspective outlined by Towle & Godolphin in this article. Their phrase "tokenistic..." keeps coming to mind. As in, "Professionals have difficulty letting go of their expert role, leading to tokenistic involvement rather than partnership which requires a reduction in the power difference between [insert your profession here] and [their profession here]." 

Consider the dynamics that underscore this sentence: Power! Collaboration! Expertise! Control! Interprofessional partnerships! If these weren't the makings of an article in Medical Teacher, they would certainly be so for a Lifetime Original Movie for clinician educators. 

Interprofessional education and practice is the obvious alignment of these constructs: do we know enough about the roles of our colleagues in order to relinquish power in decision-making appropriately and in order to make collaborative decisions that depend on the expertise of multiple people? 

The not-so-obvious and more common application of these constructs might be in the everyday collaborations; the educational programs (as described in this article) and the manuscript-writing partnerships. And, perhaps in the committee formation and the policy revisions. When do we truly expect and ask for authentic collaboration, and when are we comfortable with tokenistic involvement? For ourselves and for our colleagues? Do we always know which is being given? 

Interprofessional collaboration is the authentic application of involvement from many professions to the care of the patient. But how are we trained to do this in our other professional roles, and how do we encourage and expect it of our colleagues? 

Bottom Line:

Read this article to prompt a discussion of meaningful collaboration, but apply the concept to other professional areas. Perhaps some reflection here might set a higher bar for communication with our patients, our learners, our colleagues, and ourselves.

Wednesday, October 23, 2013

From the Field: Is Your Research Powerful?

We're lucky enough to have this "From the Field" post from Paul Visintainer, PhD, Director of Epidemiology and Biostatistics here in Academic Affairs at Baystate. Grab a pencil, a notebook, and your most mathematically-oriented friend and read this post as your own personal lesson in error, statistical power, and sample size. Because - of course - even educational research requires a power analysis!  

Statistics Review 4: Sample Size Calculations. Whitley E, Ball J. Critical Care. 2002; 6: 335-341. Available online from the Baystate Health Sciences Library, or from PubMed at your institution. 

I am always pleased when I see articles expounding the importance of sample size and statistical power in clinical research – particularly when they appear in clinical journals.  I was happy to see the Whitley and Ball article in Critical Care.  Although the article is several years old, the issues discussed are still – and will continue to be – relevant.  I just want to reinforce and expand on a couple of the concepts.

First, among some disciplines, there exists an idea – a myth, really – that sample size and statistical power are only relevant for randomized clinical trials.  I don’t know where this idea got started or how it continues, but it is completely untrue.  Any time a researcher wishes to describe a comparison with a p-value, he must also address statistical power.  This applies to all study types – randomized controlled trials, cohort studies, case-control studies, retrospective chart reviews (this last is actually not a study type, but rather a method of data collection), etc.  If the study uses a data analytic approach that generates a p-value, then statistical power should be addressed. 

The above issue becomes clear when one considers the reasoning underlying statistical analysis and sample size.  Suppose a researcher posits a question about some clinical effect -- e.g., Doesan exposure cause disease? Does the treatment reduce morbidity?  Does a medication reduce pain?  Since he doesn’t know the correct answer, he designs a study to answer the question.  (If he knew the right answer, he wouldn’t have to conduct the study, right?)  He wants to make a “generalizable” statement, (e.g., the effect is generally real for all patients).  However, he isn’t studying all patients – he is only studying a sample of patients.   

So, his conclusion will have some error because he is basing his results on only one sample.  In designing his study (before he collects any data), there are two possible errors he could make in his conclusion. 
A) He could conclude that there is an effect in his sample, when one truly doesn’t exist in the population.  This is the α-error (“alpha” or Type 1) and this is what the p-value reflects. 
B) He could conclude from his sample that there is no effect, when one truly exists in population.  This is the β-error (“beta” or Type 2) and this is what power reflects – actually, 1-β = power.

It is important to note that these errors are “conditional”.  That is, α-error is the probability of rejecting the null hypothesis if the null hypothesis is true. β-error is the probability of NOT finding an effect if the null hypothesis is false.  At the start of the study, the investigator doesn’t know the true status of the null hypothesis, so BOTH errors have to be addressed.  Among other things, this is precisely what a sample size calculation does.  It incorporates estimates of both errors into the computation of sample size.

Once the study is conducted, only one state for the null hypothesis will exist in the sample:
A)  If the results reject the null hypothesis with a p < 0.05,  then the investigator either has found a true effect or has a Type 1 error
B)  If the results fail to reject the null hypothesis, then the investigator has either demonstrated that there is truly no effect or he has a Type 2 error.

Notice that because the results are based on a sample and not the population, there will always be some error in the interpretation, regardless of what the p-value is.  Notice also that nowhere in this discussion does the approach apply only to an RCT type study.  The errors and uncertainty facing the investigator are present regardless of study type

The second point about statistical power discussed by the authors is the effect size.  I think this is a fundamental issue because when sample size and statistical power are addressed, the investigator establishes the clinical context within which study results will be interpreted.  How does he do that with sample size?  Consider the factors that go into computing a sample size:  type 1 error, type 2 error, some measure of variability (e.g., variance or standard deviation) and the specified difference in the groups or the treatment/exposure effect. Once these four factors are specified, the sample size can be computed.   

It is the last component – the specified difference in the groups or treatment/exposure effect – where the investigator defines what is clinically important.  After all, isn’t that the goal of clinical research?  To comment on clinically important effects?  It has been the eternal frustration of statisticians to have an investigator ask, “How many patients do I need to find a statistically significant result?”  Asking a question in this manner indicates that the investigator has not considered his study within a clinical context.  Similarly, a protocol that proposes a review 100 charts without any sample size justification, essentially suffers the same limitation – the investigator has not begun to think of his study (or the subsequent results) within a clinical context.

Luckily, at Baystate there are people in Academic Affairs that can help clinicians work through the issues of sample size and statistical power.  A discussion with a statistician is a great way to address sample size.  There are a lot of options and configurations to consider and clinicians should be aware how these options may affect your study. 


P.S.  (To answer the question, “How many charts should I review to find statistical significance?” Well, if clinical relevance is . . . irrelevant . . . , then I think it is safe to say if one reviews 10,000 charts, something will turn up statistically significant.  Better yet, review 20,000 charts just to be sure.)

Bottom Line:

Ask a statistician.