# Kaplan-Meier estimator

### From Ganfyd

Statistical tool used to describe the time to the occurence of an event among a set of subjects, i.e. a method of survival analysis.^{[1]}^{[2]}^{[3]} Its most common application is as the **survival function**, that is describing survival data, for instance, in cancer patients, where death is the event being examined. It could equally apply to any event, e.g. relapse of a disease, failure of an implanted device, etc.

The data are used to generate the probability of the event. It is most commonly displayed as a graphical plot, with time on the x-axis, while the y-axis displays the cumulative proportion of subjects who have experienced the event of interest. The plot consists of steps reflecting either the occurence of an event at a particular time point, or removal of an individual due to censoring.

The method is used for data where length of follow-up varies as other methods such as the t-test are more appropriate for situations where follow-up is comprehensive and of a fixed length. This is achieved by allowing **censoring** of data in instances where the follow-up of a subject is lost or terminated for reasons unrelated to the outcome measure, e.g. patient withdrawal from the study. These are indicated on the plot with a vertical tick.

While it is possible to have a single plot, it is more common to plot two different groups on the same axes, e.g. patients treated with adjuvant chemotherapy versus a control group, thus permitting both a graphical comparison and a comparison with other statistical methods such as the Logrank test.

The graph usually starts at 1 (i.e. all patients alive) and slowly decreases. However, in some situations, for instance in a study of time taken to achieve a particular skill, it is more appropriate to look at the cumulative rate of the event, i.e. (1 - cumulative survival).

## Contents |

## Assumptions

The calculations assume that:^{[4]}

- Censored subjects behave in a similar way to those still under follow-up. There could be a biologically significant reason why some patients are lost to follow-up.
- Subjects recruited early and late in a study are similar. The longer a study, the less likely this is to hold true. For instance, in cancer, case mix may change and earlier detection may mean a biologically less aggressive tumours.
- The event is assumed to occur at the time point recorded. This is obvious in the case of death, but this may not be the case, for instance in cancer recurrence, where this is dependent on time of detection.

## How to do it?

### SPSS (Statistical Package for the Social Sciences)

- To compare 2 groups, need a minimum of 3 columns:
**Time**to event, or if event did not occur, the length of follow-up.**Status**, i.e. did the event occur? Can be specified as 1/0, Y/N, etc. Values can be defined in the 'data' mode.**Factor**, i.e. what is different between the two groups, e.g. adjuvant chemotherapy vs no chemotherapy (again indicated by 1/0 or Y/N).

- Select from menu:
`Analysis -> Survival -> Kaplan-Meier`

- Transfer the data columns to the appropriate boxes with the arrows.
- Define the
`Status`

event, e.g. if survival analysis and death=1, then specify '1'. - Specify options if required (allows Logrank test).
- Press
`OK`

### R

R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. Download from a mirror of R-Project.org. S is another open source statistical framework. |

Use the **survival** library. Then load the data as a data frame with headings **time**, **status** and **x** (where **x** is the differing factor, e.g. chemotherapy vs no chemotherapy).

The functions used can have several parameters, but for the basic plot, the default settings will suffice. For detailed information, see the R manual.^{[5]}^{[6]}

- Load library. Type:
`library(survival)`

- The
`Surv`

function processes a list of time and status data to produce a sequence of time values. Values which are censored are suffixed with a`+`

. Usage:^{[7]}`Surv(mydata$time,mydata$status)`

- The
`survfit`

function then processes data objects from`Surv`

by calculating cumulative proportions.^{[8]} - Plot using the
`plot`

function. - Combining this into one line:
`plot(survfit(Surv(mydata$time,mydata$status) ~ x))`

or`plot(survfit(Surv(time, status) ~ x, data = mydata))`

- For cumulative probability of the event, use:
`plot(survfit(Surv(time, status) ~ x, data = mydata), fun="event")`

- For Logrank test, use
`survdiff`

function.

The type of line and colour can be changed using:

`plot(survfit(Surv(time, status) ~ x, data = mydata), lty=c(1,2), col=c("red", "blue"))`

## References

- ↑ Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. Journal of the American Statistical Association 1958;53, 457-81.
- ↑ Altman DG, Bland JM. Time to event (survival) data. BMJ. 1998 Aug 15;317(7156):468-9. (Direct link - via Pubmed central)
- ↑ Bland JM, Altman DG. Survival probabilities (the Kaplan-Meier method). BMJ. 1998 Dec 5;317(7172):1572. (Direct link - via Pubmed central)
- ↑ Bland JM, Altman DG. Survival probabilities (the Kaplan-Meier method). BMJ. 1998 Dec 5;317(7172):1572. (Direct link - via Pubmed central)
- ↑ R manual entry on
**Surv** - ↑ R manual entry on
**survfit** - ↑ R manual entry on
**Surv** - ↑ R manual entry on
**survfit**