Uncertainty and decision making

Contemplating a project of some colleagues regarding decision making under the uncertainty of where a tumor will be with respect to the radiation field during breathing led me to wonder about the whole range of uncertainties that should be considered. Traditionally in radiation oncology we are concerned whether the tumor will always be in the radiation field when you set up the patient on a daily basis for weeks.  When the tumor position is affected by respiration or bowel gas, then it is even harder to know this.  Recently, on-board imaging has helped us to understand (and sometimes manage) the motion.  Increasing the size of the radiation field (a.k.a. using a PTV) is one approach to reducing uncertainty.

But what about other  uncertainties?  Take any cubic millimeter.  What is the number of tumor cells?  Classic radiation biology uses Poisson statistics to calculate the uncertainty in radiation’s ability to sterilize the tumor.  So uncertainty exists and can be accounted for.  What  about the more recent realization that tumors do not contain a single clonogen but, rather, many genetically different  cells? Hopefully, genetic characterization would give us some insight as to how the differences affect the cells’ radiation sensitivity.  Epigentic factors, too, play a role in establishing a phenotypic radiation response.  However, even in this optimistic case where we have some mechanistic understanding, we can only alter the probabilities.  So here we have an understanding that uncertainty exists, and in some cases we may be able to characterize it, but at this point even accurate estimates of the probabilities are hard to come by.  Near the margins of the tumor, we talk about the clinical tumor volume which consists of “microscopic disease”, by which we mean possible tumor cells that we have no solid knowledge regarding their existence.  What we know comes from surgical/biopsy specimens or from clinical outcomes with regard to treating such a region in other patients.  Here our uncertainty is complete with regard to the particular patient and our only knowledge comes from population averages.

Much of radiation oncology (and medicine in general) is devoted to reducing the uncertainty by techniques such as recursive partitioning analysis and classification algorithms, e.g. support vector machines and logistic regression.  Concepts such as stage, grade, TNM classification are all ways of predicting  outcomes as  a function of therapies, thereby reducing our uncertainty.  Such musing leads us to consider the confluence of medical decision making and uncertainty.  On one side, we can say that the minimum uncertainty is when we know for sure that the treatment will effect a cure or will surely fail.  Then we have a probability of 1.0 or 0.0 and, hence, no uncertainty.  The most uncertainty we have is when there is a 50% chance of cure.  Surely it is better in the decision making realm to have no uncertainty.  However in the real world–that is, the world of the patient and doctor–a 50% chance of cure is better than 0%.  So we can conclude that uncertainty in these types of decisions is not necessarily a bad thing.  Therefore, we are left to continue our quest for better strategies for making decisions under uncertainty.  The question of the day is: do we want to continue understanding the biology to the point that we know exactly what will happen to a person when we know that in some fraction of the cases we will be depriving the patient of hope?

Advertisements

Standard Gamble is now becoming real life

An article in the NY Times  highlights an interesting development in which a theoretical construct is becoming an actual reality.  The article describes how immunotherapy is being tested in patients with melanoma.  Several drugs seem to result in some improvement in survival and combinations of drugs result in more drastic improvements.  However, these combinations can be lethal (in one test 3/46 died from the drugs), as well as causing myriad side-effects.

In decision theory, there is a concept of “utility” which is basically a quantitative measure of how much one values something.  In economic terms, this value is relatively easy to assess since you are usually dealing with either actual money or something with monetary value.  In health care, it is not so straightforward.  Would you rather live 10 years with pain or 5 years without?  One way of trying to assess these utilities when the outcome is uncertain (as it always is in medicine) is called the Standard Gamble.  Imagine trying to assess how someone values a given health state, for example living with “dry mouth” which  may limit how well you can swallow and eat certain foods.  In this test, the person is given a choice: (a) live the rest of your life with dry mouth, or (b) take a pill which has a probability, P, of curing you but also a probability, 1-P, of killing you instantly.  Let’s start off with P = 90%.  That is, if you take the pill, there is a 90% chance you will be cured of dry mouth for the rest of your life.  Do you take the pill with its 10% chance of dying or do you live the rest of your life with your condition?  The test consists of varying the probability, P, until the person cannot choose between them.  That value of P is then called the “utility for the condition of dry mouth.”

We now have the situation where patients (and physicians) can choose between a situation where the outcome is pretty well-known or trying a new therapy which has the promise of making things much better but can also kill you.  Now the situation is not exactly the same as the Standard Gamble since the new therapy brings with it some new complications even if they are not fatal.  But it does make one think of whether we can use data from these real life situations to study how utilities are measured.

Virtual trials

Physicists are fond of conducting “virtual trials” by which is meant that they select a number of random or representative cases, compute treatment plans for them using two different methods and then compare the results.  Usually these are done to show the differences (or lack thereof) between two different methods of radiation delivery, or sometimes, of optimization. 

In general, this is a reasonable and cost effective means of coming to some conclusion about the appropriate uses of new technology. However, as they are most often conducted, these trials do little to answer any relevant questions.  In general, they meet few, if any, of the criteria for a clinical trial.  Instead, it seems as though physicists have defined their own standards for a virtual trial.  What are these standards? How do they compare with the norms in clinical medicine?

Clinical trials are grouped into four stages, ranging from determination of the intervention’s safety, to its efficacy in a controlled group, to its efficacy in the population at large.  Do our physics-oriented virtual trials call into any of these categories.  At one end of the spectrum, physicists are concerned with safety, namely a Phase I trial.  They wish to avoid initiating a technology or procedure that will lead to patient harm.  At the other end, one could argue (thought physicists never do) that they are also conducting Phase IV-like trials since the cases are selected with little regard for the biological and physiological variables that can mediate the response to the intervention.  Most often, cases are selected because they are dosimetrically “interesting” or, on the other hand, “tractable”.  The latter characteristic underpins the continued popularity of virtual trials of prostate cancer with its two significant organs-at-risk.  Once those cases are dealt with and the technology has been shown to handle the simple situations, then interesting cases are selected based on their dosimetric complexity. 

Does this way of viewing the issue lead to any worthwhile considerations?  In the sense that clinical trials are now the gold standard for progress in medical practice, the answer is yes.  If we, as physicists, wish to lead the field forward by definitively answering questions, then we need to meet the same standards as other in the field.  So what are the characteristics of clinical trials that translate directly to a medical physics approach?

First, the endpoint of the trial must be described and justified at the beginning.  Too often, physicists merely pile up metrics at the end of the project, calculate statistical significance of differences, and then make some pronouncement based thereon.  Clinical trials do not have the luxury of waiting until the end of the trial to define their endpoints for several reasons, chief among them the ethics of human research.  Physicists are free of that limitation, but then suffer the possibility of being accused of cherry-picking the results.  More importantly, however, the failure to declare and justify the endpoints at the beginning vitiates the impact of the results since others are less likely to be convinced by this method of conducting the trial.  Providing a convincing rationale at the beginning of the work puts the results on a firm footing and helps structure the entire virtual trial.

Elucidation of a clear set of metrics by which to judge the trial’s efficacy must be done in conjunction with the relevant clinicians.   It is sometimes the case in current comparisons that dose metrics are tested for statistical significance with the somewhat absurd results that dose differences of less than 1 Gy are reported as significant.  Statistically, maybe (although it can certainly be argued that any set of cases in which dosimetric parameters that are so close in value, yet statistically significant, exhibit a homogeneity that hardly reflects clinical practice); clinically, no. [e.g. DC Weber, et al, Int J Radiat Oncol Biol, Phys, 75(5): 1578-86, 2009]  It is important to determine up front what is going to be conclusive evidence of improvement for the application being studied.  It is at this point that determination of the phase of the trial is important.  Evaluating safety is likely to result in a different set of trial metrics than would be used in a Phase III trial.

Rigor in methods is also an important component  of a virtual trial.  GIven the complexities of modern treatment plans, optimization algorithms are often used in virtual trials.  However, the algorithms in the current generation of treatment planning software is very operator dependent. In other cases, such as comparisons of protons and x-rays, different planning systems and dose calculation algorithms must be used. Great care must be taken  in designing methods that provide a fair comparison for plans.  In the case of optimization algorithms, user options must be constrained.  When different planning systems are used, some effort at judging their relative differences (outside the parameters of the trial) must be made. 

There is an additional burden that the use of optimization places on virtual trials that is not usually a part of clinical trials.  That is, clinical trials do not usually look at the correspondence between normal tissue outcomes (complications) in conjuction with tumor response.  In some cases, there may be reason to believe that tumor response is related to or coupled with normal tissue response and hence justifies the reporting of the correlation, but this is rarely done.  In inverse planning, the algorithm searches for some ideal solution and when it cannot find one that meets all the objectives, finds a plan that incorporates a trade-off between the competing (tumor vs normal tissue) objectives.  For this reason, it is imperative that virtual trials incorporating inverse planning report results for each individual, not just aggregate measures such as averages of single metrics. 

Finally, to make the connection between clinical trials and virtual trials, it is interesting to consider Phase III and IV trials.  One definition is: “Phase IV studies are conducted after the intervention has been marketed. These studies are designed to monitor effectiveness of the approved intervention in the general population and to collect information about any adverse effects associated with widespread use.” [Gates Foundation]  If we replace the word “marketed” with the words “clinically implemented”, then we have a good description of the introduction of new technologies and methods into clinical use including the performance of a virtual trial at the beginning of the process.  To those who argue that such trials are likely to not achieve statistical significance because of a lack of sufficient numbers of patients, one may ask whether there is justification in spending the money to purchase the new technology.

For those institutions that conduct virtual trials and based (at least partly) on the results take the next step of clinical use, it would be very worthwhile for them to collect data and report back on the correspondence between the trial and the clinical outcomes.  In many cases, e.g. IMRT and VMAT, the differences are so small that it is certainly not unethical to randomize patients between the two and measure the differences (if any) in outcomes, thereby conducting a Phase III trial.  For those who are so convinced that using the old technology is not justified, then certainly reporting on the outcomes and comparing them to the historical results and the conclusions of the virtual trial would be of great value.

In conclusion, it behooves the medical physics community to meet the standards that we and society in general (particularly given the Patient Protection and Affordable Care Act) expect of medical research.  These changes will enhance the usefulness of medical physics research, provide comfort to the public knowing that careful measures are being taken to insure the safe and efficacious introduction of new therapies, and hopefully also lead to the more rational use of our health care dollars.

Statistical significance in clinical trials

An editorial from 2011 in the JNCI (“Demystify statistical significance–time to move on from the P value  to Bayesian analysis“, J J Lee, 103:2, 2011) caught my attention.  It was written in response to an article in the same issue by Ocana and Tannock (“When are ‘positive’ clinical trials in oncology truly positive,” 103: 16, 2011).  In Lee’s editorial, he notes that Ocana and Tannock make four points:  “(1) statistical significance does not equate to clinical significance; (2) the P value alone is not sufficient to conclude that a drug  works; (3) a prespecified magnitude of clinical benefit…is required to gauge whether a drug works or not; and (4) the absolute difference is more relevant than the relative difference in measuring a drug’s efficacy.”

Lee uses this as a starting point to highlight how a Bayesian approach is more appropriate for clinical trials.  First he lays out the familiar description that “the Bayesian approach calculates the probability of the parameter(s) given the data, whereas the frequentist approach computes the probability of the data given the parameter(s).”  He then goes on to give two examples in which the probability of success is (a) .4 (4 of 10) and (b) p=0.27 (27 of 100), which give the same P value of 0.033 in a binomial one-sided  test.  The prior probabilities being tested are 0.2 vs 0.4.  However, if you calculate the posterior probabilities, you find that even though the P values were nearly identical, the first test results in a proability that the parameter is greater than 0.4 is 0.432, whereas in the second test, the probability is 0.003.  It is a nice demonstration of the limitations of the frequentist approach.

However, it is Ocana and  Tannock’s article that is a bit more relevant to the points that I have been making with my decision modeling.  In my paper, “When is better best: a multiobjective approach“, we made the point that in doing comparisons, it is important to state the criteria that will be used to decide superiority.   We didn’t explicitly state that the magnitude of the difference for each of those criteria should also be stated, mainly because we were pessimistic that we could even get the variable to be named.  However, O&T’s point is on the mark.

Their first point is the one that I have focused on more explicitly with my modeling and reflects  my interest in using Bayesian networks to define the magnitude of the improvement that is needed to make a study/new technology worthwhile.  Again, we are on the same page when it comes to deciding (up-front, not after the fact) what the important variables are, and the magnitudes required to make the clinical tradeoffs worth the effort.