Uncertainty in Kaplan-Meier curves

This post is indirectly related to things Bayesian.  One of the nodes in a Bayes net I have been working on is the tumor control probability (TCP) for oropharyngeal cancer.  Where do we get the TCP values?  One place is from Kaplan-Meier (KM) curves.  If you do a KM curve for each of several doses and then focus on a particular time point, the TCP values at that time can be obtained.  Now KM curves have a confidence interval which is +/- 1.96*sqrt(Variance).

Well, what is the variance?  According to the textbook, it is a function only of the survival probability at time, t, times the sum of a weighted (by 1/number_at_risk) average of the conditional risk at prior failure times.  In other words, it does not take into account the biological variability.

Take the example of two experiments. The first (the “naive”) experiment does not stratify patients by cancer stage, i.e. you draw from a patient pool that includes all patients with that type of cancer regardless of whether it is Stage I or IV.  Pick some “n” number of patients and perform the KM analysis.  Note that there is no measurement uncertainty: at any given time you know how many patients you have and how many fail.  The variance is a measure of some dispersion based on number at risk.  The second (“stratified”) experiment only chooses patients with one particular stage, e.g. II.  Choose the same number “n” of patients.  It is not unlikely that both experiments will give you the same KM curve since the naive experiment is an average effect over all stages.  Now if you were to repeat these two experiments a number of times in order to measure the variance at different times, you should get different variances.  In the naive experiment, your distribution of stages within any selected sample will vary somewhat with a resultant difference in survival.  In the  stratified experiment, the population distribution will be narrower.  (If you don’t think that is true, then figure out how they defined “stage” to begin with).  In this thought  experiment, there are two very different variances based on the biology.  Now, since we are measuring a mean which might be argued is distributed normally, the variance of the mean survival is relatively narrow, but there should still be a difference mathematically if not clinically significant.

My conclusion is that either the KM confidence interval doesn’t contain the  whole story or I don’t understand the statistics as well as I think I do. In any case, it has helped sharpen my thinking about the confidence limits on TCP curves.