Inspired by Erick Turner I wrote these two rapid reviews based on FDA reports:
I produced these reviews, not to claim perfection but to familiarise myself with the type of content and to try to see what a report might look like. After I’d finished them I sent the links to Erick who, very generously, supplied some really useful feedback which I reproduce below (with his permission). I consider myself incredibly lucky to have had Erick’s considerate teaching on this matter!
Background: whether in the product labeling or journal articles, the mechanisms always sound very “scientific”, and my hunch is clinicians are overly impressed by that, to the point that they perceive the drug to be effective before they’ve even been presented with any clinical trial evidence. Additionally, oftentimes some things are mentioned that may be of no relevance to whether the drug works. Finally, despite the fact that the words may be foreign to the average clinician, the drug in question may be a me-too drug.
The two graphs you have there have no error bars, and there are no P values so one doesn’t know whether the differences are statistically significant.
BTW, even negative studies have somewhat similar-looking graphs. There’s always an impressive reduction from baseline, but for placebo as well as active drug.
What such change-from-baseline graphs don’t show is how far the “average” patient might be from a score that might be meaningful. I have on occasion, redone graphs so that the y-axis shows absolute score.
I’m not advocating for this, as this would involve extra work, and my complaint here hasn’t been talked about in the peer-reviewed literature (yet), as far as I know. Just mentioning it as a further limitation of such graphs.
By contrast, the graph that we looked at together as we talked focused on the end-of-study values and showed the 95% confidence intervals around the point estimates. So that allowed you to see whether the results were statistically significant and how they compared for the different groups.
More importantly, they provided a sense of CLINICAL significance. As I’m sure you well know, there can be statistical significance without clinical significance.
This becomes an issue when you consider the the FDA’s threshold for approval rests primarily on statistical, not clinical, significance is 2 or more studies — but the magnitude of the effect may be quite small.
So I would recommend either:
1.directly presenting something that shows both types of significance. That could be the (a) raw unstandardardized change from baseline, like that graph we looked at together. Or (b) it could be a standardized effect size, of the type I showed in my NEJM article and others have used, either a Hedges’s g or Cohen’s d.
2.Showing enough statistics in a summary table that would allow researcher(s) doing meta-analyses to undertake (1). An example would be Table 3 on page 13 of 99 of the statistical review (the page preceding the graphs you showed). The stats have been streamlined (or stripped down) into Table 9 on page 21. What’s been omitted are the sample sizes and baseline scores, which could frustrate some researchers (and possibly some clinicians, who often like to know whether the study was big or small. Yet another table to consider would be Table 19 on page 54/99 of the stat review. However, it includes a “key secondary efficacy endpoint” — but I would try to suggest adhering to a primary-only policy.
Interesting point about the mortality for the drug for DM.
One possibly related comment, probably obvious, is that when we talk about “primary outcome”, we’re usually focused only on the efficacy side of the risk-benefit ratio. The FDA looks separately as the two issues of efficacy and safety, and a problem with either one of them can derail approval. (In the DM drug case, it appears that the distinction is more blurry, in contrast to within the world of psychotropic drugs.)
A lot of drug SRs and MAs focus only on efficacy — that’s all I’ve done in my papers, for example.
But when a clinician is deciding whether to add a drug to his/her arsenal, he/she needs to know about safety, as well.
All the safety data you want should also be within the FDA review—you’re certainly not going to find more safety issues in the published literature. It’s just a matter of whether you want to go after that, as well, so just have to decide on your scope.
In asking his permission to reproduce his reply he suggested I include the following table to better illustrate the data and show that you can utilise the data for meta-analysis purposes.
3 thoughts on “Example rapid review: Feedback from Erick Turner”
Some more feedback, if I may…
I do think the meta-analysis offers “added value” beyond what one finds in the FDA review and essentially all journal articles reporting on individual clinical trials.
Some comments, largely based on how I have conducted meta-analyses. Please see methods from my meta-analysis of FDA and journal data on antipsychotics at http://journals.plos.org/plosmedicine/article?id=10.1371%2Fjournal.pmed.1001189#s2
1- I would first look for evidence of a dose-response relationship. Such a relationship is very rare among psychotropic drugs, esp. if you look within the dose range that is FDA-approved as effective (see below).
2- To be fair, I would meta-analyze doses only that are FDA-approved. The sponsor may have tested dosages that ultimately proved ineffective, but they shouldn’t be penalized for exploring for the lowest effective dose. (Having said that, information on all dosages should be disclosed in the journal article and not omitted for fear of “confusing” clinicians.) The approved dose range can be looked up in the product label. There are many sources, but one I use a lot and that is free, is DailyMed. For this drug, that section of the label can be found at http://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=2d301358-6291-4ec1-bd87-37b4ad9bd850 . There you can see that the “recommended target dose is 2 mg to 4 mg once daily). Based on this, the meta-analysis would exclude the FDA results on doses of 0.25 mg and 1 mg/day. If one is worried about the bias introduced by excluding those dosages, one could always do a sensitivity analysis the other way, including both approved and unapproved dosages.
3- The scaling of the graph axis is, IMHO, much too broad. It extends from -100 to +100, but the effect sizes are less than 10, so only about 1/20th of the graph is potentially informative.
4- Displayed are unstandardized, as opposed to standardized, effect sizes. The former works when you have just 1 scale (e.g. the PANSS) as the primary. However, in our study of 8 antipsychotics (link above), although the majority of studies employed the PANSS, there were a number of studies that used the BPRS (Brief Psychiatric Rating Scale). With the standardized ES, the difference in the primary outcome between the two groups (drug vs. placebo) is divided by the pooled standard deviation. The result thus expresses the separation between drug and placebo in terms of the number of standard deviations. A significant advantage of using standardized ES is that it allows you to combine data from trials using different scales, which is kosher as long as they measure the same construct (psychosis in this case). It is not uncommon for the same construct to have two or more well-accepted scale. (With depression, we have the MADRS and HAMD, the latter having at least 5 variations employing different numbers of items ranging from 6 to 24.) With an unstandardized ES, when you encounter different studies using different scales, you are obliged to segregate them into separate meta-analyses like the proverbial apples and oranges. But with standardized ES, you can combine them, giving you a single “bottom line” answer, which, it might be argued, is a/the major reason one undertakes meta-analysis. Another advantage of standardized ESs is that they allow you to compare (but not combine data!) across different types of interventions, allowing you to see which intervention types have consistent strong effects and which have trouble demonstrating superiority to an inactive control.
5- Continuing from #3 above, but if you do undertake a standardized meta-analysis, then the graph scale will have to cover an even narrower interval, probably between 0 and 0.8 (max 1.0) – remember, we’re talking about standard deviations and not numbers on a rating scale.
Hope this feedback helps.