Study 329: MK, HK, SK and GSK

3
172

The Letter below from Marty Keller and colleagues was sent to many media outlets, to retraction watch, and to professional organizations on Wednesday. Ā Paul Basken from the Chronicle for Higher Education asked me for a response which I sent about an hour after receiving the letter. Ā This response is from me rather than the 329 group. This and other correspondence features and will feature onĀ Study329.org.

One quick piece of housekeeping. Ā Restoring Study329 is not about giving Paroxetine to Adolescents ā€“ its about all drugs for all indications across medicine and for all ages. Ā It deals with standard Industry MO to hype benefits and hide harms. Ā One of the best bits of coverage of this aspect of the story yesterday was inĀ Cosmopolitan.

The Letter from Keller

Dear _______

Nine of us whose names are attached to this email (we did not have time to create electronic signatures) were authors on the study originally published in 2001 in theĀ Journal of the American Academy of Child and Adolescent PsychiatryĀ entitled, ā€œEfficacy of paroxetine in the treatment of adolescent major depression: a randomized controlled trial,ā€ and have read the reanalysis of our article, which is entitled, ā€œRestoring Study 329:Ā  efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescenceā€, currently embargoed for publication in theĀ British Medical JournalĀ (BMJ) early this week. We are providing you with a brief summary response to several of the points in that article that which with we have strong disagreement. Given the length and detail of the BMJ publication and the multitude of specific concerns we have with its approach and conclusions, we will be writing and submitting to the BMJā€™s editor an in-depth letter rebutting the claims and accusations made in the article. It will take a significant amount of work to make this scholarly and thorough and do not have a time table; but that level of analysis by us far exceeds the time frame needed to give you that more comprehensive response by today.

The study was planned and designed between 1991-1992. Subject enrollment began in 1994, and was completed in 1997, at which time analysis of the data commenced. Ā The study authors comprised virtually all of the academic researchers studying the treatment of child depression in North America at the time. The study was designed by academic psychiatrists and adopted with very little change by GSK, who funded the study in an academic / industry partnership.Ā  The two statisticians who helped design the study are among the most esteemed in psychiatry.Ā  The goal of the study designers was to do the best study possible to advance the treatment of depression in youth, not primarily as a drug registration trial.Ā  Some design issues would be made differently today ā€” best practices methodology have changed over the ensuing 24-year interval since inception of our study.

In the interval from when we sat down to plan the study to when we approached the data analysis phase,Ā but prior to the blind being broken, the academic authors,Ā not the sponsor, added several additional measures of depression as secondary outcomes.Ā  We did so because the field of pediatric-age depression had reached a consensus that the Hamilton Depression Rating Scale (our primary outcome measure) had significant limitations in assessing mood disturbance in younger patients. Accordingly, taking this into consideration, and in advance of breaking the blind, we added secondary outcome measures agreed upon by all authors of the paper.Ā  We found statistically significant indications of efficacy in these measures. This was clearly reported in our article, as were the negative findings.

In the ā€œBMJ-Restoring Study 329 ā€¦ā€ reanalysis, the following statement is used to justify non-examination of a range of secondary outcome measures:

Both before and after breaking the blind, however, the sponsors made changes to the secondary outcomes as previously detailed. Ā We could not find any document that provided any scientific rationale for these post hoc changes and the outcomes are therefore not reported in this paper.Ā Ā 

This is not correct.Ā  The secondary outcomes were decidedĀ by the authors prior to the blind being broken.Ā  We believe now, as we did then, that the inclusion of these measures in the study and in our analysis was entirely appropriate and was clearly and fully reported in our paper. Ā While secondary outcome measures may be irrelevant for purposes of governmental approval of a pharmaceutical indication, they were and to this day are frequently and appropriately included in study reports even in those cases when the primary measures do not reach statistical significance.Ā  The authors of ā€œRestoring Study 329ā€ state ā€œthere were no discrepancies between any of our analyses and those contained in the CSR [clinical study report]ā€.Ā  In other words, the disagreement on treatment outcomes rests entirely on the arbitrary dismissal of our secondary outcome measures.

We also have areas of significant disagreement on the ā€œRestoring Study 329ā€ analysis of side effects (which the authorā€™s label ā€œharmsā€).Ā Ā  Their reanalysis uses the FDA MedDRA approach to side effect data,Ā which was not available when our study was done.Ā  We agree that this instrument is a meaningful advance over the approach we used at the time, which was based on the FDAā€™s then current COSTART approach. That one can do better reanalyzing adverse event data using refinements in approach that have accrued in the 15 years since a studyā€™s publication is unsurprising and not a valid critique of our study as performed and presented.

A second area of disagreement (concerning the side effect data) is with their statement, ā€œWe have not undertaken statistical tests for harms.ā€ The authors of ā€œRestoring Study 329ā€ with this decision are saying that we need very high and rigorous statistical standards for declaring a treatment to be beneficial but for declaring a treatment to be harmful then statistics canā€™t help us and whatever an individual reader thinks based on raw tabulation that looks like a harm is a harm.Ā  Statistics of course does offer several approaches to the question of when is there aĀ meaningfulĀ difference in the side effect rates between different groups.Ā  There are pros and cons to the use of P values, but alternatives like confidence intervals are available.

ā€œRestoring Study 329ā€ asserts that this paper was ghostwritten, citing an early publication by one of the coauthors of that article. There was absolutely nothing about the process involved in the drafting, revision, or completion of our paper that constitutes ā€œghostwritingā€. This study was initiated by academic investigators, undertaken as an academic / industry partnership, and the resulting report was authored mainly by the academic investigators with industry collaboration.

Finally the ā€œRestoring Study 329ā€ authors discuss an initiative to correct publications called ā€œrestoring invisible and abandoned trials (RIAT)ā€ (BMJ, 2013; 346-f4223).Ā  ā€œRestoring Study 329ā€ states ā€œWe reanalyzed the data from Study 329 according to the RIAT recommendationsā€ but gives no reference for a specific methodology for RIAT reanalysis.Ā  The RIAT approach may have general ā€œrecommendationsā€ but we find no evidence that there is a consensus on precisely how such a RIAT analysis makes the myriad decisions inherent in any reanalysis nor do we think there is any consensus in the field that would allow the authors of this reanalysis or any other potential reanalysis to definitively say they got it right.

In summary, to describe our trial as ā€œmisreportedā€ is pejorative and wrong, both from consideration of best research practices at the time, and in terms of a retrospective from the standpoint of current best practices.

Martin B. Keller, M.D.
Boris Birmacher, M.D.
Gregory N. Clarke, Ph.D.
Graham J. Emslie, M.D.
Harold Koplewicz, M.D.
Stan Kutcher, M.D.
Neal Ryan, M.D.
William H. Sack, M.D.
Michael Strober, Ph.D.

Boxed harms

Response

In the case of a study designed to advance the treatment of depression in adolescents, it seems strange to have picked imipramine 200-300mg per day as a comparator, unusual to have left the continuation phase unpublished, odd to have neglected to analyse the taper phase, dangerous to have downplayed the data on suicide risks and the profile of psychiatric adverse events more generally and unfortunate to have failed to update the record in response to attempts to offer a more representative version of the study to those who write guidelines or otherwise shape treatment.

As regards the efficacy elements, the correspondence we had with GSK, which will be available on Study329.org as ofĀ  Sept 16Ā and on the BMJ website, indicates clearly that we made many efforts to establish the basis for introducing secondary endpoints not present in the protocol.Ā  GSK have been unwilling or unable to provide evidence on this issue, even though the protocol states that no changes will be permitted that are not discussed with SmithKline.Ā  We would be more than willing to post any material that Dr Keller and colleagues can provide.

Whatever about such material, it is of note that when submitting Study 329 to FDA in 2002,Ā GSK described the study as a negative StudyĀ and FDA concurred that it was negative.Ā  This is of interest in the light of Dr Kellerā€™s hint that it was GSKā€™s interests to submit this study to regulators that led to a corruption of the process.

Several issues arise as regards harms.Ā  First, we would love to see the ADECs coding dictionary if any of the original investigators have one.Ā  Does anyone know whether ADECs requires suicidal events to be coded as emotional lability or was there another option?

Second, can the investigators explain why headaches were moved from classification under Body as a Whole in the Clinical Study Report to sit alongside emotional lability under a Nervous System heading in the 2001 paper?

It may be something of a purist view but significance testing was originally linked to primary endpoints.Ā  Harms are never the primary endpoint of a trial and no RCT is designed to detect harms adequately.Ā  It is appropriate to hold a company or doctors who may be aiming to make money out of vulnerable people to a high standard when it comes to efficacy, but for those interested to advance the treatment of patients with any medical condition it is not appropriate to deny the likely existence of harms on the basis of a failure to reach a significance threshold that the very process of conducting an RCT will mean cannot be met, as investigators’ attention is systematically diverted elsewhere.

As regards RIAT methods, a key method is to stick to the protocol. A second safeguard is to audit every step taken and to this end we have attached a 61 page audit record (Appendix 1) to this paper.Ā  An even more important method is to make the data fully available, which it will be on Study329.org.

As regards ghostwriting, I personally am happy to stick to the designation of this study as ghostwritten.Ā  For those unversed in these issues, journal editors, medical writing companies and academic authors cling to a figleaf that if the medical writers name is mentioned somewhere, s/he is not a ghost.Ā  But for many, the presence on the authorship line of names that have never had access to the data and who cannot stand over the claims made other than by assertion is whatā€™s ghostly.

Having made all these points, there is a point of agreement to note.Ā  Dr Keller and colleagues state that:

ā€œnor do we think there is any consensus in the field that would allow the authors of this reanalysis or any other potential reanalysis to definitively say they got it rightā€.

We agree.Ā  For us, this is the main point behind the article.Ā  This is why we need access to the data.Ā  It is only with collaborative efforts based on full access to the data that we can manage to get to a best possible interpretation, but even this will be provisional rather than definitive.Ā  Is there anything that would hold the authors of the second interpretation of these data (Keller and colleagues) back from joining with us, the authors of the third interpretation, in asking that the data of all trials for all treatments, across all indications, be made fully available?Ā  Such a call would be consistent with the empirical method that was as applicable in 1991 as it is now.

David Healy
Holding Response on Behalf of RIAT 329

***

Mad in America hosts blogs by a diverse group of writers. These posts are designed to serve as a public forum for a discussionā€”broadly speakingā€”of psychiatry and its treatments. The opinions expressed are the writers’ own.

Previous articleWhen Thereā€™s No Place Like Homeā€¦.Ā 
Next articlePsychiatry & Organized Crime
David Healy, MD
David Healy is a founder of Data Based Medicine and RxISK.org and has authored of over 240 peer reviewed articles, 300 other pieces, and 25 books. His main areas of research are adverse effects of treatment, clinical trials in psychopharmacology, the history of psychopharmacology, and the impact of both trials and psychotropic drugs on our culture.

3 COMMENTS

  1. Perhaps Dr. Benbow would like a copy of the ‘real’ results of Study 329:

    Dr. Benbow: “We have been asked by the regulatory authorities to provide all our information related to suicides and I can tell you the data that we provide to them clearly shows no link between Seroxat and an increased risk of suicide – no link.”
    BBC reporter asked Dr. Benbow if he was “absolutely confident” that Dr. Healy is wrong on this issue and will be shown to be wrong?”
    Dr. Benbow replied: “Yes, absolutely. Not only that but Doctor Healy has made the same claims about a range of other medicines. He made the same claims about Prozac… [repeats]…he made the same claims about a range of other SSRIs. On every occasion he has been found to be wrong.”

    Report comment

LEAVE A REPLY