Bacterial decision making (really!)

There is a fascinating article in the latest edition of Physics Today (February, 2014):

Bacterial Decision Theory by Jane’ Kondev

I was initially attracted to the article by its provocative title.  Like a bee to honey, I could not resist looking at it.  There are a number of reasons I enjoyed it so much; here’s a synopsis followed by a slightly more digressive discussion:

  • It is interesting for its own sake about understanding how genes are expressed.
  • It provides a good example of how a Bayesian model (my interpretation–not the author’s) can be expanded from simple observed probabilities to include predictions from sophisticated and mathematical models.
  • It highlights the importance of modeling when doing statistical analyses.
  • It provides some background for thinking about the processes that might result in differential response of cancer cells (or normal cells, for that matter) to radiation and/or chemo.

In a quick synopsis, the article describes in great detail how E coli cells can convert to using glucose or lactose for energy.  The system operates as a function of several variables: presence/absence of lactose and glucose and the presence/absence of two molecules, namely Lac-repressor and CRP.  The CRP increases the likelihood that RNA polymerase will bind to the lac promoter portion of the DNA; the Lac-repressor sits on the promoter portion, thereby inhibiting the RNA polymerase binding.  Lac-repressor tends to be present when lactose is absent; CRP is present when glucose is absent.  Binding of RNA polymerase for making the protein that digests lactose follows the basic rules:

  • lactose +, glucose +, no RNA polymerase
  • lactose -, glucose +, no RNA polymerase
  • lactose -, glucose -, no RNA polymerase
  • lactose +, glucose – , RNA polymerase can bind

That makes a nice 2×2 matrix that is easily encoded into a Bayesian network (BN).  However, these are chemical reactions, not logical theses, so statistical mechanics is actually a better operational model.  A little calculation might give you probabilities that are a little different from 0/1 depending on concentrations.  A little more work gets you a full-blown model based on the free energies and entropies of the two states (bound/not bound).  In the BN world, you can now have a much more sophisticated, quantitative, continuous model without really modifying the BN in any substantial way.

The author also does a nice job of describing how having such a model can really help form the experimental setup required and the statistical analyses that should be performed to test this model.  This mirrors the discussion on the use of modeling strategies in performing appropriate statistical tests in the first chapter of “Regression Modeling Strategies” by F E Harrell (Springer, 2001).

Finally, the article also describes how other aspects of the cellular mechanism, such as transporter activity across the cell membrane, can lead to positive feedback and a switching between states.  As another example of bacterial free will, the article describes how E coli can switch phenotypes between antibiotic-sensitive to resistant and vice versa.  These interactions between cellular activity and the environment bring to mind some of the issues with differences between tumor cell responses to radiation.  The simple LQ model, while pretty good, might be greatly improved by including something like the antibiotic resistance mechanism described. Genetic instability might explain part of the hetergeneity of response, but so may  environmental factors and cellular feedback processes.

Utility of Fisher’s p-value

An article in Nature takes another (and long-needed) look at this near-religious symbol of scientific correctness:

Here is one paragraph to give you a taste:

“One result is an abundance of confusion about what the P value means4. Consider Motyl’s study about political extremists. Most scientists would look at his original P value of 0.01 and say that there was just a 1% chance of his result being a false alarm. But they would be wrong. The P value cannot say this: all it can do is summarize the data assuming a specific null hypothesis. It cannot work backwards and make statements about the underlying reality. That requires another piece of information: the odds that a real effect was there in the first place. To ignore this would be like waking up with a headache and concluding that you have a rare brain tumour — possible, but so unlikely that it requires a lot more evidence to supersede an everyday explanation such as an allergic reaction. The more implausible the hypothesis — telepathy, aliens, homeopathy — the greater the chance that an exciting finding is a false alarm, no matter what the P value is.”

and another:

“Many statisticians also advocate replacing the P value with methods that take advantage of Bayes’ rule: an eighteenth-century theorem that describes how to think about probability as the plausibility of an outcome, rather than as the potential frequency of that outcome. This entails a certain subjectivity — something that the statistical pioneers were trying to avoid. But the Bayesian framework makes it comparatively easy for observers to incorporate what they know about the world into their conclusions, and to calculate how probabilities change as new evidence arises.”

An interesting chain of connections

As a great example of the inter-relatedness of science and math, I would like to lay out a recent chain of connections I made the other day.  One cool point is that the last link in the chain is a memory of an experience I had when in graduate school (over 30 years ago).  It started when I was researching a new project that was discussed with some colleagues.  Each item in the following list indicatesa different, but related, concept.

  • How to compare the cost effectiveness of the treatment of lung cancer with x-rays or protons;
  • Markov models are one approach
  • In reading a little more about Markov models, I found an intriguing reference to connection between Markov models and Monte Carlo methods
  • A quick google search on the above-mentioned concepts kept pointing me to Markov Chain Monte Carlo  (MCMC)
  • Having seen the topic of MCMC in books and articles, but never having the time or interest to followup and understand it, I looked for articles on this
  • Found a great article that caught my attention in the first page or two with some anecdotal history of Monte Carlo methods in physics:                                                                                                    The Evolution of Markov Chain Monte Carlo Methods,  Matthew Richey, The American Mathematical Monthly, Vol. 117, No. 5 (May 2010), pp. 383-413
  • The article explains the Metropolis algorithm, which was first used in nuclear physics, but quickly applied to topics like Ising glasses.  The Metropolis algorithm was a way of searching a very large state space very efficiently by sampling the most likely states most often.  It used the energy of an ensemble of particles in thermodynamics.
  • From there, the Metropolis algorithm was found to be useful in problems of combinatorial optimization, where, again, many states had to be searched to find the optimal solution.
  • One of the consequences was the development of simulated annealing algorithm.  This is one personal connection since I remember very well the day I sat in the library at PSI (Switzerland) and read Steve Webb’s paper on simulated annealing for IMRT.  This was pretty early in the development of this algorithm and I give Steve a lot of credit for understanding and applying it so well and so quickly.
  • One of the classic combinatorial optimization problems is the traveling salesman problem.
  • At the time that this algorithm was being applied to optimization, solid state circuits were starting their incredible rise in number of components and density.  A big problem was how best to add new ones. There is a cost associated with the distance between related components so the traveling salesman problem was relevant.
  • A physicist who had studied spin glasses–Scott Kirkpatrick–went to work for IBM and started working on the circuit board problem.  He recognized that the objectives that needed to be met were very similar to the equation for the energy of spin glasses, the configuration of which was being solved using the Metropolis algorithm.
  • Final step: I have always remembered–and often cited as an example of cross-field intellectual fertilization–a talk I heard as a graduate student at the University of Wisconsin in which a physicist described the exact scenario I just recounted above.

So–loop closed.  I am glad (a) to find out who it was, (b) that the point I always took away from it was correct, and (c) to learn more about the actual problem.

P.S. The Bayesian connection of this story will be told in a forthcoming post.

Models versus data

From the geographer Strabo:

“And whenever we have not been able to learn by the evidence of sense, there reason points the way.”

He was speaking about knowing the edges of the inhabitable world, but it fits just as well when we speak about whichever world in which we are interested.

Virtual trials

Physicists are fond of conducting “virtual trials” by which is meant that they select a number of random or representative cases, compute treatment plans for them using two different methods and then compare the results.  Usually these are done to show the differences (or lack thereof) between two different methods of radiation delivery, or sometimes, of optimization. 

In general, this is a reasonable and cost effective means of coming to some conclusion about the appropriate uses of new technology. However, as they are most often conducted, these trials do little to answer any relevant questions.  In general, they meet few, if any, of the criteria for a clinical trial.  Instead, it seems as though physicists have defined their own standards for a virtual trial.  What are these standards? How do they compare with the norms in clinical medicine?

Clinical trials are grouped into four stages, ranging from determination of the intervention’s safety, to its efficacy in a controlled group, to its efficacy in the population at large.  Do our physics-oriented virtual trials call into any of these categories.  At one end of the spectrum, physicists are concerned with safety, namely a Phase I trial.  They wish to avoid initiating a technology or procedure that will lead to patient harm.  At the other end, one could argue (thought physicists never do) that they are also conducting Phase IV-like trials since the cases are selected with little regard for the biological and physiological variables that can mediate the response to the intervention.  Most often, cases are selected because they are dosimetrically “interesting” or, on the other hand, “tractable”.  The latter characteristic underpins the continued popularity of virtual trials of prostate cancer with its two significant organs-at-risk.  Once those cases are dealt with and the technology has been shown to handle the simple situations, then interesting cases are selected based on their dosimetric complexity. 

Does this way of viewing the issue lead to any worthwhile considerations?  In the sense that clinical trials are now the gold standard for progress in medical practice, the answer is yes.  If we, as physicists, wish to lead the field forward by definitively answering questions, then we need to meet the same standards as other in the field.  So what are the characteristics of clinical trials that translate directly to a medical physics approach?

First, the endpoint of the trial must be described and justified at the beginning.  Too often, physicists merely pile up metrics at the end of the project, calculate statistical significance of differences, and then make some pronouncement based thereon.  Clinical trials do not have the luxury of waiting until the end of the trial to define their endpoints for several reasons, chief among them the ethics of human research.  Physicists are free of that limitation, but then suffer the possibility of being accused of cherry-picking the results.  More importantly, however, the failure to declare and justify the endpoints at the beginning vitiates the impact of the results since others are less likely to be convinced by this method of conducting the trial.  Providing a convincing rationale at the beginning of the work puts the results on a firm footing and helps structure the entire virtual trial.

Elucidation of a clear set of metrics by which to judge the trial’s efficacy must be done in conjunction with the relevant clinicians.   It is sometimes the case in current comparisons that dose metrics are tested for statistical significance with the somewhat absurd results that dose differences of less than 1 Gy are reported as significant.  Statistically, maybe (although it can certainly be argued that any set of cases in which dosimetric parameters that are so close in value, yet statistically significant, exhibit a homogeneity that hardly reflects clinical practice); clinically, no. [e.g. DC Weber, et al, Int J Radiat Oncol Biol, Phys, 75(5): 1578-86, 2009]  It is important to determine up front what is going to be conclusive evidence of improvement for the application being studied.  It is at this point that determination of the phase of the trial is important.  Evaluating safety is likely to result in a different set of trial metrics than would be used in a Phase III trial.

Rigor in methods is also an important component  of a virtual trial.  GIven the complexities of modern treatment plans, optimization algorithms are often used in virtual trials.  However, the algorithms in the current generation of treatment planning software is very operator dependent. In other cases, such as comparisons of protons and x-rays, different planning systems and dose calculation algorithms must be used. Great care must be taken  in designing methods that provide a fair comparison for plans.  In the case of optimization algorithms, user options must be constrained.  When different planning systems are used, some effort at judging their relative differences (outside the parameters of the trial) must be made. 

There is an additional burden that the use of optimization places on virtual trials that is not usually a part of clinical trials.  That is, clinical trials do not usually look at the correspondence between normal tissue outcomes (complications) in conjuction with tumor response.  In some cases, there may be reason to believe that tumor response is related to or coupled with normal tissue response and hence justifies the reporting of the correlation, but this is rarely done.  In inverse planning, the algorithm searches for some ideal solution and when it cannot find one that meets all the objectives, finds a plan that incorporates a trade-off between the competing (tumor vs normal tissue) objectives.  For this reason, it is imperative that virtual trials incorporating inverse planning report results for each individual, not just aggregate measures such as averages of single metrics. 

Finally, to make the connection between clinical trials and virtual trials, it is interesting to consider Phase III and IV trials.  One definition is: “Phase IV studies are conducted after the intervention has been marketed. These studies are designed to monitor effectiveness of the approved intervention in the general population and to collect information about any adverse effects associated with widespread use.” [Gates Foundation]  If we replace the word “marketed” with the words “clinically implemented”, then we have a good description of the introduction of new technologies and methods into clinical use including the performance of a virtual trial at the beginning of the process.  To those who argue that such trials are likely to not achieve statistical significance because of a lack of sufficient numbers of patients, one may ask whether there is justification in spending the money to purchase the new technology.

For those institutions that conduct virtual trials and based (at least partly) on the results take the next step of clinical use, it would be very worthwhile for them to collect data and report back on the correspondence between the trial and the clinical outcomes.  In many cases, e.g. IMRT and VMAT, the differences are so small that it is certainly not unethical to randomize patients between the two and measure the differences (if any) in outcomes, thereby conducting a Phase III trial.  For those who are so convinced that using the old technology is not justified, then certainly reporting on the outcomes and comparing them to the historical results and the conclusions of the virtual trial would be of great value.

In conclusion, it behooves the medical physics community to meet the standards that we and society in general (particularly given the Patient Protection and Affordable Care Act) expect of medical research.  These changes will enhance the usefulness of medical physics research, provide comfort to the public knowing that careful measures are being taken to insure the safe and efficacious introduction of new therapies, and hopefully also lead to the more rational use of our health care dollars.

Statistical significance in clinical trials

An editorial from 2011 in the JNCI (“Demystify statistical significance–time to move on from the P value  to Bayesian analysis“, J J Lee, 103:2, 2011) caught my attention.  It was written in response to an article in the same issue by Ocana and Tannock (“When are ‘positive’ clinical trials in oncology truly positive,” 103: 16, 2011).  In Lee’s editorial, he notes that Ocana and Tannock make four points:  “(1) statistical significance does not equate to clinical significance; (2) the P value alone is not sufficient to conclude that a drug  works; (3) a prespecified magnitude of clinical benefit…is required to gauge whether a drug works or not; and (4) the absolute difference is more relevant than the relative difference in measuring a drug’s efficacy.”

Lee uses this as a starting point to highlight how a Bayesian approach is more appropriate for clinical trials.  First he lays out the familiar description that “the Bayesian approach calculates the probability of the parameter(s) given the data, whereas the frequentist approach computes the probability of the data given the parameter(s).”  He then goes on to give two examples in which the probability of success is (a) .4 (4 of 10) and (b) p=0.27 (27 of 100), which give the same P value of 0.033 in a binomial one-sided  test.  The prior probabilities being tested are 0.2 vs 0.4.  However, if you calculate the posterior probabilities, you find that even though the P values were nearly identical, the first test results in a proability that the parameter is greater than 0.4 is 0.432, whereas in the second test, the probability is 0.003.  It is a nice demonstration of the limitations of the frequentist approach.

However, it is Ocana and  Tannock’s article that is a bit more relevant to the points that I have been making with my decision modeling.  In my paper, “When is better best: a multiobjective approach“, we made the point that in doing comparisons, it is important to state the criteria that will be used to decide superiority.   We didn’t explicitly state that the magnitude of the difference for each of those criteria should also be stated, mainly because we were pessimistic that we could even get the variable to be named.  However, O&T’s point is on the mark.

Their first point is the one that I have focused on more explicitly with my modeling and reflects  my interest in using Bayesian networks to define the magnitude of the improvement that is needed to make a study/new technology worthwhile.  Again, we are on the same page when it comes to deciding (up-front, not after the fact) what the important variables are, and the magnitudes required to make the clinical tradeoffs worth the effort.