Physicists are fond of conducting “virtual trials” by which is meant that they select a number of random or representative cases, compute treatment plans for them using two different methods and then compare the results. Usually these are done to show the differences (or lack thereof) between two different methods of radiation delivery, or sometimes, of optimization.
In general, this is a reasonable and cost effective means of coming to some conclusion about the appropriate uses of new technology. However, as they are most often conducted, these trials do little to answer any relevant questions. In general, they meet few, if any, of the criteria for a clinical trial. Instead, it seems as though physicists have defined their own standards for a virtual trial. What are these standards? How do they compare with the norms in clinical medicine?
Clinical trials are grouped into four stages, ranging from determination of the intervention’s safety, to its efficacy in a controlled group, to its efficacy in the population at large. Do our physics-oriented virtual trials call into any of these categories. At one end of the spectrum, physicists are concerned with safety, namely a Phase I trial. They wish to avoid initiating a technology or procedure that will lead to patient harm. At the other end, one could argue (thought physicists never do) that they are also conducting Phase IV-like trials since the cases are selected with little regard for the biological and physiological variables that can mediate the response to the intervention. Most often, cases are selected because they are dosimetrically “interesting” or, on the other hand, “tractable”. The latter characteristic underpins the continued popularity of virtual trials of prostate cancer with its two significant organs-at-risk. Once those cases are dealt with and the technology has been shown to handle the simple situations, then interesting cases are selected based on their dosimetric complexity.
Does this way of viewing the issue lead to any worthwhile considerations? In the sense that clinical trials are now the gold standard for progress in medical practice, the answer is yes. If we, as physicists, wish to lead the field forward by definitively answering questions, then we need to meet the same standards as other in the field. So what are the characteristics of clinical trials that translate directly to a medical physics approach?
First, the endpoint of the trial must be described and justified at the beginning. Too often, physicists merely pile up metrics at the end of the project, calculate statistical significance of differences, and then make some pronouncement based thereon. Clinical trials do not have the luxury of waiting until the end of the trial to define their endpoints for several reasons, chief among them the ethics of human research. Physicists are free of that limitation, but then suffer the possibility of being accused of cherry-picking the results. More importantly, however, the failure to declare and justify the endpoints at the beginning vitiates the impact of the results since others are less likely to be convinced by this method of conducting the trial. Providing a convincing rationale at the beginning of the work puts the results on a firm footing and helps structure the entire virtual trial.
Elucidation of a clear set of metrics by which to judge the trial’s efficacy must be done in conjunction with the relevant clinicians. It is sometimes the case in current comparisons that dose metrics are tested for statistical significance with the somewhat absurd results that dose differences of less than 1 Gy are reported as significant. Statistically, maybe (although it can certainly be argued that any set of cases in which dosimetric parameters that are so close in value, yet statistically significant, exhibit a homogeneity that hardly reflects clinical practice); clinically, no. [e.g. DC Weber, et al, Int J Radiat Oncol Biol, Phys, 75(5): 1578-86, 2009] It is important to determine up front what is going to be conclusive evidence of improvement for the application being studied. It is at this point that determination of the phase of the trial is important. Evaluating safety is likely to result in a different set of trial metrics than would be used in a Phase III trial.
Rigor in methods is also an important component of a virtual trial. GIven the complexities of modern treatment plans, optimization algorithms are often used in virtual trials. However, the algorithms in the current generation of treatment planning software is very operator dependent. In other cases, such as comparisons of protons and x-rays, different planning systems and dose calculation algorithms must be used. Great care must be taken in designing methods that provide a fair comparison for plans. In the case of optimization algorithms, user options must be constrained. When different planning systems are used, some effort at judging their relative differences (outside the parameters of the trial) must be made.
There is an additional burden that the use of optimization places on virtual trials that is not usually a part of clinical trials. That is, clinical trials do not usually look at the correspondence between normal tissue outcomes (complications) in conjuction with tumor response. In some cases, there may be reason to believe that tumor response is related to or coupled with normal tissue response and hence justifies the reporting of the correlation, but this is rarely done. In inverse planning, the algorithm searches for some ideal solution and when it cannot find one that meets all the objectives, finds a plan that incorporates a trade-off between the competing (tumor vs normal tissue) objectives. For this reason, it is imperative that virtual trials incorporating inverse planning report results for each individual, not just aggregate measures such as averages of single metrics.
Finally, to make the connection between clinical trials and virtual trials, it is interesting to consider Phase III and IV trials. One definition is: “Phase IV studies are conducted after the intervention has been marketed. These studies are designed to monitor effectiveness of the approved intervention in the general population and to collect information about any adverse effects associated with widespread use.” [Gates Foundation] If we replace the word “marketed” with the words “clinically implemented”, then we have a good description of the introduction of new technologies and methods into clinical use including the performance of a virtual trial at the beginning of the process. To those who argue that such trials are likely to not achieve statistical significance because of a lack of sufficient numbers of patients, one may ask whether there is justification in spending the money to purchase the new technology.
For those institutions that conduct virtual trials and based (at least partly) on the results take the next step of clinical use, it would be very worthwhile for them to collect data and report back on the correspondence between the trial and the clinical outcomes. In many cases, e.g. IMRT and VMAT, the differences are so small that it is certainly not unethical to randomize patients between the two and measure the differences (if any) in outcomes, thereby conducting a Phase III trial. For those who are so convinced that using the old technology is not justified, then certainly reporting on the outcomes and comparing them to the historical results and the conclusions of the virtual trial would be of great value.
In conclusion, it behooves the medical physics community to meet the standards that we and society in general (particularly given the Patient Protection and Affordable Care Act) expect of medical research. These changes will enhance the usefulness of medical physics research, provide comfort to the public knowing that careful measures are being taken to insure the safe and efficacious introduction of new therapies, and hopefully also lead to the more rational use of our health care dollars.