An editorial from 2011 in the JNCI (“Demystify statistical significance–time to move on from the P value to Bayesian analysis“, J J Lee, 103:2, 2011) caught my attention. It was written in response to an article in the same issue by Ocana and Tannock (“When are ‘positive’ clinical trials in oncology truly positive,” 103: 16, 2011). In Lee’s editorial, he notes that Ocana and Tannock make four points: “(1) statistical significance does not equate to clinical significance; (2) the P value alone is not sufficient to conclude that a drug works; (3) a prespecified magnitude of clinical benefit…is required to gauge whether a drug works or not; and (4) the absolute difference is more relevant than the relative difference in measuring a drug’s efficacy.”

Lee uses this as a starting point to highlight how a Bayesian approach is more appropriate for clinical trials. First he lays out the familiar description that “the Bayesian approach calculates the probability of the parameter(s) given the data, whereas the frequentist approach computes the probability of the data given the parameter(s).” He then goes on to give two examples in which the probability of success is (a) .4 (4 of 10) and (b) p=0.27 (27 of 100), which give the same P value of 0.033 in a binomial one-sided test. The prior probabilities being tested are 0.2 vs 0.4. However, if you calculate the posterior probabilities, you find that even though the P values were nearly identical, the first test results in a proability that the parameter is greater than 0.4 is 0.432, whereas in the second test, the probability is 0.003. It is a nice demonstration of the limitations of the frequentist approach.

However, it is Ocana and Tannock’s article that is a bit more relevant to the points that I have been making with my decision modeling. In my paper, “When is better best: a multiobjective approach“, we made the point that in doing comparisons, it is important to state the criteria that will be used to decide superiority. We didn’t explicitly state that the magnitude of the difference for each of those criteria should also be stated, mainly because we were pessimistic that we could even get the variable to be named. However, O&T’s point is on the mark.

Their first point is the one that I have focused on more explicitly with my modeling and reflects my interest in using Bayesian networks to define the magnitude of the improvement that is needed to make a study/new technology worthwhile. Again, we are on the same page when it comes to deciding (up-front, not after the fact) what the important variables are, and the magnitudes required to make the clinical tradeoffs worth the effort.