When researchers at Pfizer first began a Phase 2 trial of an acute stroke therapy
in 2000, they decided to take a novel approach. The study—called the ASTIN
trial—would determine the drug’s optimal dose not with three or four
different dosing arms, as trials often have, but with 15. Data from the trial would be
captured continuously and used to make changes in real time to how the trial was run. As
new patients joined, they would be randomized to a particular arm based on those
real-time results—a process which required an intensive level of relatively
novel statistics.
Before the drug entered Phase 3, the data showed that it was ineffective, and the
trial ended in 2001. But processing a massive amount of real-time data as the trial went
on, rather than waiting to conduct the analysis after it ended, saved Pfizer about $3.5
million by the company’s estimates. “Pfizer dropped [the therapy]
like a hot potato,” says Donald Berry, a biostatistician at the University of
Texas M.D. Anderson Cancer Center in Houston.
ASTIN’s methodology, termed adaptive trial design, goes against the
sacred rule of the randomized double-blind, placebo-controlled trial: In order to avoid
influencing the trial’s outcome, no one should know who is getting which
treatment.
But in an increasingly common approach, a trial can be altered in various ways
while it’s still in progress, perhaps by tweaking the sample size, dropping
ineffective dosing arms, or even changing the endpoint. Such modifications are based on
a peek at interim data—which of necessity means unblinding the data before the
trial’s completion.
“If you’re looking at trial data before it’s
completed, there’s always the chance that you’re jeopardizing the
trial’s integrity.” —Scott Evans, Harvard University
“The reasons to do it are pretty clear,” says Janet Wittes,
president of the clinical trial design firm Statistics Collaborative, Inc., in
Washington, DC. Adaptive trials “hold the potential and the promise for doing
trials faster and getting answers faster.” Plus, assessing assumptions you
made in planning a trial as it proceeds can strengthen a trial’s scientific
merit, says Scott Evans, a biostatistician at Harvard University.
According to Berry, a vocal champion of adaptive design, “the
randomized trial moved us up to a very high level in terms of science,” he
says. “We’re trying to preserve that level but move to even higher
levels in terms of efficiency.” Clinical trials, especially late-stage ones,
can cost tens to hundreds of millions of dollars; done right, an adaptive design can
shave 20–50% off that sum.
In the past few years, pharmaceutical companies have adopted the approach in full
force. Many—such as Wyeth, Eli Lilly, and Novartis—have dedicated
divisions for adaptive trial design. For biotechs with few resources, the possibility of
a faster, cheaper clinical trial may be even more of a draw.
But like all propositions that sound too good to be true, there’s a
downside. Too often, critics caution, companies wrongly assume the approach is an easy
fix to common clinical trial woes when, in fact, the changes can ultimately cost time
and money, not save them. “There’s been a lot of overselling,
overmarketing, if you will,” says Bob O’Neil, who heads adaptive
design efforts for drugs and biologics at the US Food and Drug Administration (FDA).
“This is not a panacea for all situations,” he stresses.
“It is not standard fare.”
There’s also a bigger concern, which has both US and European
regulators struggling to delineate when the approach is appropriate and when it
isn’t. Trials are traditionally blinded in order to prevent
investigators’ or subjects’ knowledge from influencing the outcome.
“If you’re looking at trial data before it’s completed,
there’s always the chance that you’re jeopardizing the
trial’s integrity,” Evans says.
By design
Adaptive design is a slippery term, and experts argue about its definition. Some
adaptations are so straightforward that they raise no concerns whatsoever, Wittes
explains. Stopping a trial early, for example, because the treatment works either much
better or much worse than expected, is nothing new. Starting with a few concurrent
dosing arms and, based on interim results, winnowing them down to the best one or two is
also generally not controversial, she said, though the FDA would need to ensure that
protocols were in place in such cases to prevent anyone but a data monitoring board from
seeing the interim data.
Other adaptations, though, walk a finer line, and whether or not
they’re okayed depends on the reasons for implementing them as much as on the
adaptations themselves. Suppose, for example, that interim results show that the
variance in the data is larger than you’d like. “Variance is a
nuisance parameter,” says Wittes. “I don’t think anyone
has any trouble” with the idea of increasing the trial’s size in
this case. But say the treatment’s effect appears to be smaller than expected,
contradicting the prediction for efficacy that the company filed with the FDA before
commencing the trial. In that case, by increasing the trial size, she believes,
“you’re really changing your hypothesis,” which is based
on a prediction of how well the drug works.
Indeed, the FDA itself does not yet have a clear understanding of what this type
of trial entails. “We want to put some definitions in place for what we mean
as an adaptive design,” says O’Neil. To this end, the agency
assembled a working group in 2004, but has yet to release its promised guidance document
for companies to follow. Until then, the agency’s take on a particular
company’s proposal may depend on who sits on its FDA panel. The European
Medicines Evaluation Agency (EMEA) came out with a guidance document in October 2007,
and is somewhat less conservative on some elements of the approach than the FDA. One
example, explains Berry, is the way the two agencies treat seamless Phase 2/Phase 3
trials—while EMEA generally allows data from Phase 2 patients to be included
in the Phase 3 trial, FDA is more cautious.
Even straightforward adaptations don’t come without challenges. For an
adaptive dose range such as the now-defunct Cervelo Pharmaceuticals used in testing a
drug for neuropathic pain, looking at interim data and modifying the doses was
“quite a logistical nightmare,” says Marc de Somer, chief medical
officer. Also, interim data must be evaluated by an independent data monitoring board,
made up of people uninvolved in the study or the company. “It’s a
tough pill for companies to swallow, putting decisions about a trial into the hands of
totally independent bodies,” notes Bruce Turnbull, a biostatistician at
Cornell University.
Still, the use of adaptive trials is definitely on the rise, though different
experts offer varying estimates. A survey by Cytel—a biostatistical software
company in Cambridge, Mass., which pioneered adaptive trial design—of trials
conducted between 2003 and last year identified 59 companies that used the approach.
“When we hit 2007, there was [something] like a step change—now
people were really ready to try them,” said Judith Quinlan, a biostatistician
at Cytel. Today, Berry guesses that more than one in 10 trials have some sort of interim
data monitoring, while Stuart Pocock, a medical statistician at University College
London, says the number of truly adaptive trials is significantly less than 1 in 10.
(The approach is much more common in medical device trials.) Pocock notes that
“nobody has the overall picture,” since trial details are generally
kept confidential until it’s time to file with a regulatory body. According to
O’Neil, about 40 or 50 drug trials using the approach have so far submitted
filings for FDA approval.
Pitfalls and promise
Regardless of the adaptation, unblinding the data for an interim peek invariably
brings up two problems for regulators. The first one is statistical in nature. Because
statistics measure the likelihood, based on probability, that a treatment is effective,
repeating a statistical test multiple times increases the chances that one test along
the way will mistakenly show an effect where there actually is none, and companies will
submit that false-positive data to the FDA. “If you’re always
tweaking [the trial] to get the best result possible,” says Turnbull,
“then you will get the best result possible.” There are statistical
maneuvers to counteract that possibility, but they are far from straightforward, he
explains.
The second concern is operational: When an adaptation to a trial takes place,
will investigators and patients put two and two together to deduce clues about the
therapy’s efficacy? “Let’s say we’re going to
increase sample size, or drop certain arms,” Evans says. “Well, that
adaptation could send a message to people involved in the trial that the effect
isn’t what you’d expect it to be. If that then changes
people’s actions—whether it be investigators or
patients—then you’ve introduced a source of bias.”
“There are a lot of amateurs running around saying we should do this
without understanding the statistics behind it.” —Bruce Turnbull, Cornell University
By and large, adaptations in early-stage trials are not problematic. In fact,
says O’Neil, “we think they’ve been
underexplored” in that context. But regulators need much more convincing in
Phase 3, or in Phase 2 trials that propose to morph directly into Phase 3 without the
6–9 months of analysis between the two steps. Proposals that include such
seamless Phase 2/3 trials undergo intense regulatory scrutiny, which can offset whatever
time you’ve gained from the adaptation.
And at any stage, the approach requires “a lot more prospective
planning, a lot more complexity in design, and perhaps even more risk, in terms of will
[the trial] turn out as you had planned,” says O’Neil. Every
possible eventuality in the trial that may result in an adaptation must be thought
through and documented in advance—a simulation exercise requiring an enormous
amount of statistical resources that small companies generally don’t have
in-house. “There are a lot of amateurs running around saying we should do this
without understanding the statistics behind it,” Turnbull notes.
Some say that the pressures of efficiency push some companies to misuse it. One
strategy Wittes has encountered is undersizing trials in order to bait investors, she
says. A company that can’t afford to run a full trial might plan an adaptive
trial that stipulates an “outrageously large” effect for their
treatment and contains a built-in plan to increase subject numbers if this noncredible
effect size isn’t met. “Then at some planned interim they look at
the data and they say, ‘Oh, the observed effect size is smaller than
anticipated.’” Still, that limited data can be used to lure
investors to fund the larger trial. “What I worry about is that in part
[adaptive design] has caught on because of business pressures,” Evans says.
For Cervelo’s de Somer, conducting an adaptive trial meant hiring Cytel
to help plan the trial and determine what, if any, adaptations were appropriate, at the
cost of $200,000 for a $10 million trial. “I think it’s an
investment of a few months and a few hundred thousand dollars for a trial that costs at
least 50 to 100 times more,” de Somer says. “Beyond the money,
it’s probably worth it from an ethical and a regulatory
perspective.”
Indeed, if done right, the approach has one benefit that’s hard to
deny: It forces trial organizers to plan their studies carefully, whether or not they
decide to make adaptations. “I think the by-product [of attempts to implement
adaptive trials] is an appreciation of much more prospective planning in trials than we
had even five years ago,” says O’Neil. Quinlan agrees.
“The process itself is really the benefit of adaptive design,” she
says. “Whether or not you choose to have an adaptive design, if you go through
the process of evaluation of comparing it with a traditional trial, you’ll end
up with a better trial.”
This article is an excellent overview, but fails to mention that many of the adaptive designs being used today are based on Bayesian methods. Bayesian methods treat all the parameters in a model as random variables. This makes many statisticians, including myself, a bit uncomfortable. We're used to thinking of the efficacy of a drug as being an unknown constant.
The Bayesian paradigm offers several advantages, however. It handles very complex situations like hierarchical models very gracefully. It allows you to explicitly model certain types of uncertainty, such as uncertainty about which statistical model is appropriate for your data. Finally, it produces simple estimates about the future results of the trial based on the data already collected.
It is this last point that makes Bayesian methods so useful in adaptive designs. At any point in a clinical trial, you can forecast the future results and make appropriate changes. It would be nice to crunch out a single p-value and a single confidence interval based on a single look at the end of a trial, but that's not going to provide you with the flexibility you need in today's complex world. Stephen Senn has a great quote about how a statistician is the only person who doesn't believe in America, because it wasn't in Christopher Columbus's original research plan.
Every time I work on Bayesian models, I get a bad headache. It requires a different way of thinking, and it doesn't come naturally to me. As much as I would like to complain about it, though, Bayesian methods are something we need to embrace.
Steve Simon, P.Mean Consulting
The old chestnut, "plans are useless, but planning is indispensible" (often atttributed to Dwight D. Eisenhower) seems to apply here.
It may well be impossible or impractical to plan out all of the contingencies from the start, but it makes a lot of sense to go through the effort of planning it from the start so that you better understand how to adapt over the course of the trial, rather than just to be able to fudge it as it goes along.