# Statistical significance in clinical trials

An editorial from 2011 in the JNCI (“Demystify statistical significance–time to move on from the P value  to Bayesian analysis“, J J Lee, 103:2, 2011) caught my attention.  It was written in response to an article in the same issue by Ocana and Tannock (“When are ‘positive’ clinical trials in oncology truly positive,” 103: 16, 2011).  In Lee’s editorial, he notes that Ocana and Tannock make four points:  “(1) statistical significance does not equate to clinical significance; (2) the P value alone is not sufficient to conclude that a drug  works; (3) a prespecified magnitude of clinical benefit…is required to gauge whether a drug works or not; and (4) the absolute difference is more relevant than the relative difference in measuring a drug’s efficacy.”

Lee uses this as a starting point to highlight how a Bayesian approach is more appropriate for clinical trials.  First he lays out the familiar description that “the Bayesian approach calculates the probability of the parameter(s) given the data, whereas the frequentist approach computes the probability of the data given the parameter(s).”  He then goes on to give two examples in which the probability of success is (a) .4 (4 of 10) and (b) p=0.27 (27 of 100), which give the same P value of 0.033 in a binomial one-sided  test.  The prior probabilities being tested are 0.2 vs 0.4.  However, if you calculate the posterior probabilities, you find that even though the P values were nearly identical, the first test results in a proability that the parameter is greater than 0.4 is 0.432, whereas in the second test, the probability is 0.003.  It is a nice demonstration of the limitations of the frequentist approach.

However, it is Ocana and  Tannock’s article that is a bit more relevant to the points that I have been making with my decision modeling.  In my paper, “When is better best: a multiobjective approach“, we made the point that in doing comparisons, it is important to state the criteria that will be used to decide superiority.   We didn’t explicitly state that the magnitude of the difference for each of those criteria should also be stated, mainly because we were pessimistic that we could even get the variable to be named.  However, O&T’s point is on the mark.

Their first point is the one that I have focused on more explicitly with my modeling and reflects  my interest in using Bayesian networks to define the magnitude of the improvement that is needed to make a study/new technology worthwhile.  Again, we are on the same page when it comes to deciding (up-front, not after the fact) what the important variables are, and the magnitudes required to make the clinical tradeoffs worth the effort.

# Optimization article (Med Phys, 40: 021715, 2013)

As indicated on the “optimization” page, my interest in Bayesian networks stems from my work in multiobjective optimization.  A recent article in Medical Physics (Rivera, et al) described a single objective optimization method, reduced order constrained optimization, that is the latest method to try to avoid MOO by relying on past plans to obtain suitable parameters for the objective functions.  It is better than most in that they sample, using Latin hypercube sampling, the objective parameters which allows for some variation.  That is, if any plan had a low values for a particular organ, then there is a chance that it will be included in the optimization.  However, it still seems likely that the method suffers the same problem as others in this category, namely, you are unlikely to get a plan that is better than the average of your  previous efforts.  Built into these approaches is feeling that the solutions are good enough.  This method then goes on to use hard constraints that are the traditional values that everyone uses and are, almost by definition, an average and do not account for a particular individual’s possibility of receiving significantly lower dose to some organ.  This  problem is accentuated by the fact that the objective function and optimization algorithm does not reward doses less than the constraints.

That said, my biggest objection is that they still rely on weighting factors to find a single solution, thereby ignoring a key component to clinical decision making.  If you do it multiple times, letting the planner evaluate the plans and  change objectives and weightings to account for what they have seen, then you are more or less back to the same point as if you just do conventional inverse planning.

They have the wisdom to cite our paper (Holdsworth et al, Med Phys, 39: 2261, 2012) but then dismiss it because it can take so long for head and neck cases.  However, in their method, which takes several hours as well, they require some user interaction whereas our is all “off-line”.

So in summary, an interesting article but one that signals an unwillingness to leave the comfort of old habits.