A look at whether medical physics are Bayesians through the example of maintenance of certification (MOC).
Abstract: Though few will admit it, many physicists are Bayesians in at least some situations. This post discusses how the world looks through a Bayesian eye. This is accomplished through a concrete example, the Maintenance of Certification (MOC) by the American Board of Radiology. It is shown that a priori acceptance of the value of MOC relies on a Bayesian attitude towards the meaning of probabilities. Applying Bayesian statistics, it is shown that a reasonable prior greatly reduces any possible gain in information by going through the MOC, as well as providing some numbers on the possible error rate of the MOC. It is hoped that this concrete example will result in a greater understanding of the Bayesian approach in the medical physics environment.
For several decades, a debate has raged regarding the nature of probabilities. On one side of the debate are “frequentists”. They hold that probabilities are obtained by repeated identical observations with the probability of any given outcome being the ratio of the number of events with that outcome to the total number of events. The classical example is the probability of observing a “heads” or “tails” when flipping a coin. On the other side of the debate are “Bayesians” (more on that name in a bit). They hold that probabilities can also represent the degree of belief in relative frequency of a given outcome. While there are many paths by which one can reach this point, there are several common ones. The influence of prior knowledge on one’s belief that a certain event will happen is certainly one ingredient. Another path by which people reach the Bayesian viewpoint is recognition of the fact that probabilities are often useful even when it is impossible to reproduce precisely the situation so that multiple measurements can be made, such as in the field of medicine.
For those of us in the medical field, randomized controlled trials (RCT) are our effort to achieve the frequentist goal of measuring outcomes in identical situations. However, we are usually more interested in discovering the differences in probabilities for different situations, namely when an element of a therapeutic procedure has been changed. The frequentist approach lies behind the statistical tests that are used to determine whether our observations warrant the conclusion that the therapeutic modification has resulted in a true difference or not. In other words, the frequentist view is one in which seeks to determine whether the data observed are consistent with a given hypothesis. This is to be contrasted with the Bayesian view in which one seeks to determine the probability of a certain hypothesis given the data.
All of this still leaves us with the question: Why do we care whether medical physicists are Bayesians or frequentists? One good reason has been in the news recently, namely, personalized medicine. How will we ever obtain the required numbers of patients if everything is personal? Even if we take “personal” to mean harboring one or several (nearly) identical genes, recent developments are demonstrating that biological processes are nearly always the result of a large set of genes. In addition, the role of epigenetic factors reduces the homogeneity in any group selected for their genetic homogeneity.
In general, medical physicists tend to be a bit under-educated with respect to probabilities and statistics, especially in a medical environment. A very good reference for the Bayesian statistical approach is “Bayesian Approaches to Clinical Trials and Health-Care Evaluation” by DJ Speigelhalter et al. This post is a brief attempt to highlight some of the issues, but should be considered a very faint ghost of a complete discussion. To make it more concrete, I have looked at a specific situation.
There is a fascinating article in the latest edition of Physics Today (February, 2014):
Bacterial Decision Theory by Jane’ Kondev
I was initially attracted to the article by its provocative title. Like a bee to honey, I could not resist looking at it. There are a number of reasons I enjoyed it so much; here’s a synopsis followed by a slightly more digressive discussion:
- It is interesting for its own sake about understanding how genes are expressed.
- It provides a good example of how a Bayesian model (my interpretation–not the author’s) can be expanded from simple observed probabilities to include predictions from sophisticated and mathematical models.
- It highlights the importance of modeling when doing statistical analyses.
- It provides some background for thinking about the processes that might result in differential response of cancer cells (or normal cells, for that matter) to radiation and/or chemo.
In a quick synopsis, the article describes in great detail how E coli cells can convert to using glucose or lactose for energy. The system operates as a function of several variables: presence/absence of lactose and glucose and the presence/absence of two molecules, namely Lac-repressor and CRP. The CRP increases the likelihood that RNA polymerase will bind to the lac promoter portion of the DNA; the Lac-repressor sits on the promoter portion, thereby inhibiting the RNA polymerase binding. Lac-repressor tends to be present when lactose is absent; CRP is present when glucose is absent. Binding of RNA polymerase for making the protein that digests lactose follows the basic rules:
- lactose +, glucose +, no RNA polymerase
- lactose -, glucose +, no RNA polymerase
- lactose -, glucose -, no RNA polymerase
- lactose +, glucose – , RNA polymerase can bind
That makes a nice 2×2 matrix that is easily encoded into a Bayesian network (BN). However, these are chemical reactions, not logical theses, so statistical mechanics is actually a better operational model. A little calculation might give you probabilities that are a little different from 0/1 depending on concentrations. A little more work gets you a full-blown model based on the free energies and entropies of the two states (bound/not bound). In the BN world, you can now have a much more sophisticated, quantitative, continuous model without really modifying the BN in any substantial way.
The author also does a nice job of describing how having such a model can really help form the experimental setup required and the statistical analyses that should be performed to test this model. This mirrors the discussion on the use of modeling strategies in performing appropriate statistical tests in the first chapter of “Regression Modeling Strategies” by F E Harrell (Springer, 2001).
Finally, the article also describes how other aspects of the cellular mechanism, such as transporter activity across the cell membrane, can lead to positive feedback and a switching between states. As another example of bacterial free will, the article describes how E coli can switch phenotypes between antibiotic-sensitive to resistant and vice versa. These interactions between cellular activity and the environment bring to mind some of the issues with differences between tumor cell responses to radiation. The simple LQ model, while pretty good, might be greatly improved by including something like the antibiotic resistance mechanism described. Genetic instability might explain part of the hetergeneity of response, but so may environmental factors and cellular feedback processes.
An article in Nature takes another (and long-needed) look at this near-religious symbol of scientific correctness:
Here is one paragraph to give you a taste:
“One result is an abundance of confusion about what the P value means4. Consider Motyl’s study about political extremists. Most scientists would look at his original P value of 0.01 and say that there was just a 1% chance of his result being a false alarm. But they would be wrong. The P value cannot say this: all it can do is summarize the data assuming a specific null hypothesis. It cannot work backwards and make statements about the underlying reality. That requires another piece of information: the odds that a real effect was there in the first place. To ignore this would be like waking up with a headache and concluding that you have a rare brain tumour — possible, but so unlikely that it requires a lot more evidence to supersede an everyday explanation such as an allergic reaction. The more implausible the hypothesis — telepathy, aliens, homeopathy — the greater the chance that an exciting finding is a false alarm, no matter what the P value is.”
“Many statisticians also advocate replacing the P value with methods that take advantage of Bayes’ rule: an eighteenth-century theorem that describes how to think about probability as the plausibility of an outcome, rather than as the potential frequency of that outcome. This entails a certain subjectivity — something that the statistical pioneers were trying to avoid. But the Bayesian framework makes it comparatively easy for observers to incorporate what they know about the world into their conclusions, and to calculate how probabilities change as new evidence arises.”
As a great example of the inter-relatedness of science and math, I would like to lay out a recent chain of connections I made the other day. One cool point is that the last link in the chain is a memory of an experience I had when in graduate school (over 30 years ago). It started when I was researching a new project that was discussed with some colleagues. Each item in the following list indicatesa different, but related, concept.
- How to compare the cost effectiveness of the treatment of lung cancer with x-rays or protons;
- Markov models are one approach
- In reading a little more about Markov models, I found an intriguing reference to connection between Markov models and Monte Carlo methods
- A quick google search on the above-mentioned concepts kept pointing me to Markov Chain Monte Carlo (MCMC)
- Having seen the topic of MCMC in books and articles, but never having the time or interest to followup and understand it, I looked for articles on this
- Found a great article that caught my attention in the first page or two with some anecdotal history of Monte Carlo methods in physics: The Evolution of Markov Chain Monte Carlo Methods, Matthew Richey, The American Mathematical Monthly, Vol. 117, No. 5 (May 2010), pp. 383-413
- The article explains the Metropolis algorithm, which was first used in nuclear physics, but quickly applied to topics like Ising glasses. The Metropolis algorithm was a way of searching a very large state space very efficiently by sampling the most likely states most often. It used the energy of an ensemble of particles in thermodynamics.
- From there, the Metropolis algorithm was found to be useful in problems of combinatorial optimization, where, again, many states had to be searched to find the optimal solution.
- One of the consequences was the development of simulated annealing algorithm. This is one personal connection since I remember very well the day I sat in the library at PSI (Switzerland) and read Steve Webb’s paper on simulated annealing for IMRT. This was pretty early in the development of this algorithm and I give Steve a lot of credit for understanding and applying it so well and so quickly.
- One of the classic combinatorial optimization problems is the traveling salesman problem.
- At the time that this algorithm was being applied to optimization, solid state circuits were starting their incredible rise in number of components and density. A big problem was how best to add new ones. There is a cost associated with the distance between related components so the traveling salesman problem was relevant.
- A physicist who had studied spin glasses–Scott Kirkpatrick–went to work for IBM and started working on the circuit board problem. He recognized that the objectives that needed to be met were very similar to the equation for the energy of spin glasses, the configuration of which was being solved using the Metropolis algorithm.
- Final step: I have always remembered–and often cited as an example of cross-field intellectual fertilization–a talk I heard as a graduate student at the University of Wisconsin in which a physicist described the exact scenario I just recounted above.
So–loop closed. I am glad (a) to find out who it was, (b) that the point I always took away from it was correct, and (c) to learn more about the actual problem.
P.S. The Bayesian connection of this story will be told in a forthcoming post.
An editorial from 2011 in the JNCI (“Demystify statistical significance–time to move on from the P value to Bayesian analysis“, J J Lee, 103:2, 2011) caught my attention. It was written in response to an article in the same issue by Ocana and Tannock (“When are ‘positive’ clinical trials in oncology truly positive,” 103: 16, 2011). In Lee’s editorial, he notes that Ocana and Tannock make four points: “(1) statistical significance does not equate to clinical significance; (2) the P value alone is not sufficient to conclude that a drug works; (3) a prespecified magnitude of clinical benefit…is required to gauge whether a drug works or not; and (4) the absolute difference is more relevant than the relative difference in measuring a drug’s efficacy.”
Lee uses this as a starting point to highlight how a Bayesian approach is more appropriate for clinical trials. First he lays out the familiar description that “the Bayesian approach calculates the probability of the parameter(s) given the data, whereas the frequentist approach computes the probability of the data given the parameter(s).” He then goes on to give two examples in which the probability of success is (a) .4 (4 of 10) and (b) p=0.27 (27 of 100), which give the same P value of 0.033 in a binomial one-sided test. The prior probabilities being tested are 0.2 vs 0.4. However, if you calculate the posterior probabilities, you find that even though the P values were nearly identical, the first test results in a proability that the parameter is greater than 0.4 is 0.432, whereas in the second test, the probability is 0.003. It is a nice demonstration of the limitations of the frequentist approach.
However, it is Ocana and Tannock’s article that is a bit more relevant to the points that I have been making with my decision modeling. In my paper, “When is better best: a multiobjective approach“, we made the point that in doing comparisons, it is important to state the criteria that will be used to decide superiority. We didn’t explicitly state that the magnitude of the difference for each of those criteria should also be stated, mainly because we were pessimistic that we could even get the variable to be named. However, O&T’s point is on the mark.
Their first point is the one that I have focused on more explicitly with my modeling and reflects my interest in using Bayesian networks to define the magnitude of the improvement that is needed to make a study/new technology worthwhile. Again, we are on the same page when it comes to deciding (up-front, not after the fact) what the important variables are, and the magnitudes required to make the clinical tradeoffs worth the effort.