An Introduction: The Story of Bias in the STAR*D Trial and More

Ed Pigott, PhD
0
440

The 35-million-dollar Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study is the largest antidepressant effectiveness study ever conducted. STAR*D enrolled 4,041 depressed patients and provided them with exemplary free acute and continuing antidepressant care to maximize their likelihood of achieving and maintaining remission. Patients who failed to get adequate relief from their first antidepressant were provided with up to three additional trials of pharmacologically distinct drug treatments. STAR*D’s step-1 results were published in January 2006 at which time I read about them in the Washington Post. I immediately became obsessed with STAR*D, due to several apparent biases on its authors’ part, and have coauthored/authored three peer-reviewed articles deconstructing it (Boren, Leventhal, & Pigott, 2009; Pigott, Leventhal, Alter, & Boren, 2010; Pigott, 2011). I look forward to starting this blog and sharing what I’ve learned through this 5+ year obsession.

When STAR*D’s step-1 results were first published, I was conducting an outcome study of telephonic peer-coaching for Midwest Centers for Stress & Anxiety. The study had enrolled almost 200 paying clients who’d failed to meaningfully benefit from Midwest’s self-help program. On average, Midwest’s peer-coaching customers had sought treatment from five, or more, healthcare professionals prior to purchasing their service, over 50% reported suffering from anxiety and depression for more than 20 years, 73% reported prior treatment with antidepressants, and 50% with tranquilizers.

What first excited me about STAR*D is that they’d enrolled patients that seemed similar to those in Midwest’s study. Better still, STAR*D was using one of the same outcome measures that I was using, the Short-Form Health Survey’s Mental Component Scale (MCS). The MCS measures the degree to which someone’s mental health adversely impacts their ability to perform activities of daily living. On this measure, both groups were virtually identical, with an average baseline score that was approximately 2.5 standard deviations (SD) below the norm. Such low scores are indicative of people whose mental health problems are profoundly impacting their ability to perform the normal activities of daily living. I was hooked since now I had an NIMH-funded open-label study to compare the MCS scale’s results that I was seeing in Midwest’s open-label study. Neither study had a control group. If that was good enough for NIMH and the cast of America’s leading depression researchers it had assembled to oversee STAR*D, perhaps I didn’t need to feel quite so defensive about my own efforts in guiding Midwest’s study.

Several things emerged as I read and reread STAR*D’s step-1 article convincing me that significant researcher trickery was afoot…and it pissed me off.

• First, patients were assessed every two weeks during up to 14 weeks of acute-care treatment and those who scored as achieving remission were taken out of the subject pool and moved to follow-up. This happened for some patients after less than 4 weeks of treatment in a 14-week trial. I termed it the ‘tag, you’re healed’ research design since once ‘tagged,’ patients were counted as remitted without the possibility of unremitting during the remaining weeks of acute-care treatment. Everyone knows that depression ebbs and flows. This was particularly the case in STAR*D since 75.7% of its patients were diagnosed as having ‘reoccurring depression.’ Simply put, in calculating its acute-care remission rates in steps 1-4, STAR*D’s research design took full advantage of depression’s ebbs and eliminated the down-side risk of its flows. I’d never seen before such an obviously biased research design whose very purpose seemed to be to inflate the reported remission rates…and this was an NIMH-funded study?

• Second, STAR*D failed to disclose that all step-1 patients were started on Celexa in their first visit. It was only by analyzing the mean and SD data for time-in-treatment as reported in table 2 (Trivedi et al. 2006, p.32) that I determined that all patients were started on Celexa in their first visit; a fact that I subsequently confirmed in an email exchange with Stephen Wisniewski, STAR*D’s chief biostatistician (see STAR*D documents, Second Email). When calculating their remission and response rates though, STAR*D excluded those patients who after starting on Celexa dropped out without a follow-up visit. They did this despite stating in the step-1 article, “Intolerance was defined a priori as leaving treatment before 4 weeks” (p.32) and then later, “our primary analyses classified patients with missing exit Ham-D scores as non-remitters a priori” (p.34). Based on STAR*D’s stated ‘a priori’ analytic procedures, these early dropouts were treatment failures who discontinued treatment before 4 weeks without taking the exit Hamilton Rating Scale for Depression (Ham-D/Hamilton). Instead though, in STAR*D’s patient flowchart (figure 1, p.30), they classified these early dropouts as not being ‘eligible for analysis’ in violation of their ‘a priori’ analytic plan. This discovery left me wondering why the apparent deception on STAR*D’s authors’ part and willingness to fudge their stated ‘a priori’ analytic plan in ways that clearly inflated the outcomes in this NIMH-funded study?

• Third, STAR*D’s use of the QIDS-SR as the secondary measure to report Celexa’s remission rate, and sole measure to report its response rate, struck me as flagrantly biased since the QIDS-SR was used to guide treatment in every clinic visit. Subsequently, I learned that the QIDS-SR, along with the other non-blinded clinic visit assessments, were explicitly excluded from use as research measures in the NIMH-approved Research Protocol where it states “The latter are designed to collect information that guides clinicians in the implementation of the treatment protocol. Research outcomes assessments are not collected at the clinic visits. They are not collected by either clinicians or CRCs” (see STAR*D documents, STAR*D Research Protocol, pgs.47–48). STAR*D’s use of the QIDS-SR to guide care is made clear in the step-1 article where they state. “To enhance the quality and consistency of care, physicians used the clinical decision support system that relied on the measurement of symptoms (QIDS-C and QIDS-SR), side-effects (ratings of frequency, intensity, and burden), medication adherence (self-report), and clinical judgment based on patient progress” (p.30).

In an open-label study, to use a non-blinded measure to both guide care in every visit, and evaluate the effects thereof, was simply unprecedented to me due to the well-known demand bias effect that occurs. Later I learned that in STAR*D, this demand bias was particularly in play since the non-blinded clinical research coordinator reviewed the patients’ QIDS-SR responses in every clinic visit and then saw the patient to administer the QIDS-C which had the identical 16 questions and response options as the QIDS-SR (see STAR*D documents, STAR*D Clinical Procedures Manual p.75). This same coordinator also administered STAR*D’s multistep educational program for patients and families in every visit. This ‘educational’ program was based on the neurochemical imbalance theory of depression that included “a glossy visual representation of the brain and neurotransmitters,” consistently emphasized that “depression is a disease, like diabetes or high blood pressure, and has not been caused by something the patient has or has not done. (Depression is an illness, not a personal weakness or character flaw.) The educator should emphasize that depression can be treated as effectively as other illnesses,” and “explaining the basic principles of mechanism of action” for the patient’s current antidepressant drug (see STAR*D documents, STAR*D Patient Education Manual, pgs.4–7).

Given the above, how absurd to think that the QIDS-SR was an unbiased measure of patients’ responses to treatment. It was more analogous to an interrogation tool that was repeated twice in every visit; eventually some patients would certainly tell the interrogator what they knew he/she wanted to hear ! For many of those not responding, it is easy to see how STAR*D’s combining of interrogation with education/indoctrination made each appointment seem like a visit to a Vietnamese reeducation camp. Since this was not court-ordered treatment, little wonder why so many patients voted with their feet by discontinuing STAR*D’s ‘exemplary’ free care (see Pigott et al., p.274 documenting STAR*D’s step-by-step increasing dropout rate going from 26.6% in step-1 to 60.1% in step-4). STAR*D’s use of the QIDS-SR struck me as being far more like a rescue strategy designed to inflate STAR*D’s remission and response rates from some of the 24% of patients who dropped out without taking the exit Hamilton, than an honest assessment of outcomes in this study.

Finally, although STAR*D’s step-1 article reported gathering a host of outcome measures across different domains, to my knowledge only these measures’ baseline means and SDs have ever been published. Why bother collecting so many outcome measures to then not report the actual outcomes they measured? Something was clearly wrong with this study. STAR*D’s failure to report the MCS scale’s treatment outcomes was particularly upsetting since this is what first interested me in the study. By this point though, I was obsessed to discover what actually took place in STAR*D and try to correct the biased reporting of its results.

In starting this blog, I went back to the beginning over five years ago (ugh). Thankfully, Lucinda and David Bassett, Midwest’s founders, strongly supported my early efforts and psychologists John Boren, Allen Leventhal, and Greg Alter joined me in this pursuit.

Before STAR*D, I wasn’t prone to believing in conspiracy theories but as we peeled back its layers I found myself spinning theories about how this study became so profoundly misrepresented given all of the actors involved (20+ top-tier researchers/NIMH), anyone of whom could have objected and ‘blown the whistle;’ but none did. Early on, my focus was on the researchers, naively believing that they’d somehow pulled one over on NIMH. This naïveté would have been shattered far sooner if only I’d read the STAR*D FAQ’s on the background of the study that NIMH posted just weeks prior to the release of the step-1 results (NIMH, January 2006). Here are some of the highlights:

• FAQ # 2 states, “Depression is considered “treatment-resistant” when at least one adequately delivered treatment does not lead a person to reach a “remission” of their depressive symptoms, that is, to become symptom-free.“ Six times these background FAQs supplant “remission” with “to become symptom-free” when talking about STAR*D’s criterion for successful treatment. How stunning. Before the release of any data, NIMH had evidently run their own focus groups and determined that “become symptom-free” would be far more motivating to depression sufferers, their families, and healthcare professionals on the benefits from STAR*D’s antidepressant care than the term “remission.” Unfortunately, in STAR*D, the criterion defining remission was scoring seven or less on the Hamilton. This is hardly being “symptom-free” since patients could have up to seven ‘mild’ symptoms and still meet this criterion. On the Hamilton, the following items are scored as only 1; “feels like life is not worth living,” “feels he/she has let people down,” “feels incapable, listless, less efficient,” and “has decreased sexual drive and satisfaction.” So a patient having just these four symptoms would be counted as remitted with three ‘mild’ symptoms to spare. How is this synonymous with being “symptom-free?”

NIMH subsequently repeated this marketing-driven “symptom-free” false claim 36 times in its various press releases of STAR*D’s results. These included quotes from NIMH’s Director Thomas Insel and Madhukar Trivedi, a STAR*D lead investigator, in the press release following publication in the New England Journal of Medicine of the step-2 medication switch and augmentation studies (NIMH, March 2006). From Insel we get, “If the first treatment attempt fails, patients should not give up. By remaining in treatment, and working closely with clinicians to tailor the most appropriate next steps, many patients may find the best single or combination [drug] treatment that will enable them to become symptom-free” (paragraph 5); and from Trivedi, “Augmenting the first medication may be an effective way for people with depression to become symptom-free” (paragraph 11). The FDA would never allow a pharmaceutical company to make such blatantly false claims yet here they are coming from the mouths of Trivedi and Insel no less. In NIMH’s background press release and web posting, Insel and company were already laying the foundation to repeatedly make claims that they knew were false…and this from an allegedly unbiased, tax-payer funded research institute?

• FAQ # 5 states, “At each participant visit, STAR*D investigators measured symptoms and side effects to determine when and how much to increase medication doses or change to other treatments. Consistently providing this quality treatment ensured that participants had the best possible chance of benefiting from the treatments”…and then ”To ensure that there would be no bias in assessing how well each treatment worked, the information that was used for measuring the outcome results of the study was collected both by an expert clinician over the phone who had no knowledge of what treatment the participants were receiving and by a novel computer-based interactive voice response system.” I would have saved myself and my coauthors tremendous effort, time, and anguish, if only I’d read this five years ago. This FAQ makes clear that the QIDS-SR was not a research measure since it was one of several measures used to guide care and that only the blindly-administered measures were to be used to report outcomes “to ensure that there would be no bias in assessing how well each treatment worked.” Evidently, such ‘a priori’ commitments were damned though when the results did not match NIMH’s ‘pre-specified’ communication and marketing plans. Results are easily fungible when those overseeing a study are untethered to their pre-specified outcome measures and analytic plan. Such was the case in STAR*D.

• FAQ # 10 states, “The NIMH will disseminate the results of STAR*D through a defined outreach to practitioners, the media, and the public. As results become available from further detailed analyses of Levels 1 and 2, and from Levels 3 and 4, they too will be disseminated using similar strategies.” NIMH had a clear plan to market STAR*D’s results. To ensure a consistent NIMH-approved message, this plan included a STAR*D contract provision that none of the study findings could be “released, presented at meetings, or published” by the investigators without the prior “review and approval” of STAR*D’s Editorial/Communication Committee and the Government Program Officer overseeing the study (see STAR*D documents, STAR*D Contract, p.21). The bottom line is that NIMH was as fully complicit in the biased reporting of STAR*D’s results, as the most egregious examples of biased reporting in pharmaceutical-industry sponsored research. The only difference here is that STAR*D was supposedly unsoiled by the profit motive of industry-sponsored research. Unfortunately, NIMH was not unsoiled from its own deeply-held biases.

 

STAR*D Bias & More

This blog will explore a variety of aspects to STAR*D that have not been covered in the peer-reviewed literature. Besides countering the false portrayal of STAR*D’s results and their implications, I hope that this blog will help coalesce pressure on NIMH, STAR*D’s investigators, and the journals that have published STAR*D’s 70+ articles to take corrective action and set the record straight.

While STAR*D’s summary results were published in November 2006, STAR*D investigators are still publishing articles using the sham QIDS-SR as the sole outcome measure. The most recent example will be published April 31st 2011 in the Journal of Clinical Psychopharmacology, titled, “Residual Symptoms in Depressed Outpatients Who Respond by 50% But Do Not Remit to Antidepressant Medication.”

While I’ve only read this article’s abstract, I doubt that the authors disclosed that the QIDS-SR was NOT a research measure in STAR*D and was subject to significant demand biases rendering invalid the authors’ report of the QIDS-SR findings. The authors (and NIMH?) could have chosen to use the blindly-administered Hamilton to report the residual symptoms of patients who responded by 50% or more to treatment on Celexa but did not remit (i.e., ‘become symptom-free” ). Such honesty though would have required giving up the ruse that 48.6% of STAR*D’s step-1 patients were treatment responders to Celexa. My estimate of Celexa’s actual response rate based on the pre-specified Hamilton is 30 to 33%. Celexa’s low treatment response and remission rates though are a secret that neither NIMH nor STAR*D’s investigators wants professionals and the public to know. Why the secret and willingness to keep publishing results that they know to be false? That’s what NIMH and STAR*D’s investigators should be forced to explain.

This blog will also explore more than just STAR*D. Through my consulting, I often get immersed in different areas of clinical research that I think are worth commenting on without the delays of peer-review. For instance, I recently completed a project that required me to review the 8-year long NIMH Collaborative Multisite Multimodal Treatment Study of Children with Attention-Deficit/Hyperactivity Disorder. This review revealed some apparent biases on its authors’ part that I don’t believe have been reported. When blogging in such areas, I will be mindful to disclose my own biases and conflicts.

 

A Finale Note

I consider myself neither ‘anti-psychiatry’ nor ‘anti-medication’ but rather ‘anti-corrupt science’ particularly when it is tax-payer funded. STAR*D’s results have not only been falsely portrayed in ways that harm those who suffer with depression but also harm practicing psychiatrists; a topic that I will soon blog on. Over the years, I have at times seen significant benefit from the time-limited, and when necessary, intermittent use of psychotropic medications and have had the pleasure to know some incredibly competent and compassionate psychiatrists. I’ve also seen great harm though from these medications’ non-judicious use, which is becoming far more common, with the dramatic increases in the prescribing of these drugs (particularly by non-psychiatric physicians and to children), polypharmacy, and the emergence of APA’s continuation phase treatment guidelines calling for essentially their open-ended use.

It is past time for this madness to stop and a critical step in this process is honesty in the conducting, reporting, and appraising of research. In this regards, Insel (2009) made a significant first step when he acknowledged that: 1) in multiple comparative effectiveness studies of second-generation drugs for depression, schizophrenia, and bipolar disorder, these drugs have repeatedly been found to be no better than their first-generation cousins from the 1950s and 60s despite their added costs (and I would add, no better despite the billions upon billions spent over decades in public and private research efforts to make improvements in said drugs) and 2) in STAR*D, despite “14 weeks of optimal treatment,” Celexa’s success rate was no different from that commonly found by placebos in controlled trials. Insel then goes on to observe that, “The unfortunate reality is that current medications help too few people to get better and very few people to get well” (Insel, 2009, pgs.703–704). What a sad commentary on the 50+ years of psychotropic drug research efforts. Lots of pharma, APA, and NIMH fueled hype over these past 50+ years but essentially no progress. How pathetic.

While Insel’s belated acknowledgement is admirable, it was done in the service of NIMH’s new initiatives, one of which is for it to become more intimately involved in new drug development (New York Times, 2011). From my perspective, these new NIMH initiatives are profoundly misguided on multiple fronts; not the least of which is NIMH’s aiding and abetting the false portrayal of STAR*D’s findings in ways that are every bit as egregious as the very worst that has been documented in pharmaceutical-industry sponsored research. The idiocy of NIMH’s new initiatives—and the squandering of limited tax-payer research dollars to pursue them—will be covered in a future blog.

So what about Midwest? I stopped consulting for them two years ago after completing the peer-coaching study but never published the results…so here goes. The 15-week program had a 95+% completion rate, a key reason being that it was sold on a “money back guaranteed success” basis and to qualify for their money back customers had to complete all 15 peer-coaching sessions. The average coaching customer improved by approximately one SD on the MCS scale during their first four weeks in the program with additional improvements of .43 SDs by week 8, .32 SDs by week 12, and .51 SDs by week 15. Overall, Midwest’s average customer improved by over two SDs with a final MCS score of 49.41, placing them firmly within the normal range of functioning on the MCS since this scale’s norm is 50 with a standard deviation of 10. The treatment effect size for the coaching program was 1.92.

Not too shabby. While selection bias was clearly a major factor in accounting for the peer-coaching programs’ high level of success, selection bias was also in play in STAR*D since all patients had to agree to be started on Celexa in step-1 and therefore were favorably inclined to believing in a chemical cure for their depression at the start of treatment…and over five years and a 70+ peer-reviewed articles later, we’re still waiting and waiting for STAR*D to report any of these 4,041 patients’ outcomes as pre-specified. Profoundly biased? You betcha!!!

 

References:

Boren, J., Leventhal, A., & Pigott, H.E. (2009). Just how effective are antidepressant medications? Results of a major new study. Journal of Contemporary Psychotherapy, 39 (2), 93-100.

Harris, G. (1/23/2011). Federal research center will help develop medicines. New York Times.

Insel, T. R. (2009). Disruptive insights in psychiatry: Transforming a clinical discipline. Journal of Clinical Investigation, 119(4), 700–705. This article is available by emailing me @: [email protected]

National Institute of Mental Health. (January 2006). Questions and answers about the NIMH Sequenced Treatment Alternatives to Relieve Depression (STAR*D) study—Background.

National Institute of Mental Health. (March 2006). New strategies help depressed patients become symptomfree.

Pigott, H. E. (2011). STAR*D: A tale and trail of bias. Ethical Human Psychology and Psychiatry, 13(1), 6-28. This article is available by emailing me @: [email protected]

Pigott, H. E., Leventhal, A. M., Alter, G. S., & Boren, J. J. (2010). Efficacy and effectiveness of antidepressants: Current status of research. Psychotherapy and Psychosomatics, 79(5), 267–279.

Trivedi, M. H., Rush, A. J., Wisniewski, S. R., Nierenberg, A. A., Warden, D., Ritz, L., et al. (2006). Evaluation of outcomes with citalopram for depression using measurement-based care in STAR*D: Implications for clinical practice. The American Journal of Psychiatry, 163(1), 28–40.

Previous articleLetters from the Front Lines
Next articleMarch 29, 2011
Ed Pigott, PhD
The STAR*D Scandal: A psychologist who has spent five years “deconstructing” the NIMH’s large study of antidepressants tells of his findings, discusses his published articles, and posts the documents that reveal the bad--and dishonest science--at the heart of this trial.