Kaplan–Meier Curve

Soumit Kar
5 min readSep 9, 2020

--

The Kaplan–Meier estimator, also known as the product limit estimator, is a non-parametric statistic used to estimate the survival function from lifetime data.The visual representation of the function is called Kaplan-Meire curve.

It shows what the probability of an event(e.g. survival) is at a certain time interval(e.g. week, month, year). It follows time-to-event data and the question is did he/she have event or not.

Let’s consider a practical example of cancer patients .

Complete data is not available on all patient

— loss to follow up.

— study ends before patient experience event.

Event :

outcome of interest(survive — death)

Survival(S) :

Mathematically,

S=A-D/A

A= no of newly dignosed patients undder observation.

D=no of deaths in a specific period of time.

Censoring :

Subjects are said to be censored

— If they are lost to follow up.

— drop out of the study.

— study ends before they die.

— simply, some important information required to calculate is not available .

Right Censoring :

Right censoring is the most common of concern. It means that we are not certain what happened to people after some point in time. This happens when some people cannot be followed the entire time because they died or were lost to follow-up or withdrew from the study.

Left Censoring :

Left censoring is when we are not certain what happened to people before some point in time. Commonest example is when people already have the disease of interest when the study starts.

Interval Censoring :

Interval censoring is when we know that something happened in an interval (i.e. not before starting time and not after ending time of the study ), but do not know exactly when in the interval it happened. For example, we know that the patient was well at time of start of the study and was diagnosed with disease at time of end of the study, so when did the disease actually begin? All we know is the interval.

if we go for a real time data

Kaplar-Meire curve calculate time interval when there is event by patients.

Here, 10 patients in analysis and the 1st interval is (0–3)months after rearrange the data in ascending order to calculate conditional probability.

— 1 patient has event so{ (10–1)/10 =0.9}, 90% of patients survive in 3 months of desease .

— censored is not taken as an event and also exclude as doesnot give information for (3–6) months,it only gives information for (0–3) months . In next (3–6)months events = 2, E is not taken as he/she is died .so probability of survive in 3–6 months ={8–2/8}=0.75

so conditional probability for surviving in (0–6) months is( 0.9 x 0.75=0.27)

Kaplan-Meier Curve :

censored patients are detected as red vertical line .

Median survival :

Identify the 50% survival, Here median servival 11 yrs . In below graph, the curve is actually not curve, there is many small downward series .

Also, in 2 yrs survival probability is 0.83 .

Log Rank Test :

The log rank statistic is one of the most commonly used methods to learn if two curves are significantly different.  This method also known as Mantel-logrank statistics or Cox-Mantel-logrank statistics . The logrank test compares the number of observed deaths in each group with the number of deaths that would be expected based on the number of deaths in the combined groups .

It compares events between two groups. It is not time sensitive. It is not consider as important as it is not estimate which events occurs earlier .

Hazard Ratio :

The hazard ratio is a comparison between the probability of events in a treatment group, compared to the probability of events in a control group. It’s used to see if patients receiving a treatment progress faster (or slower) than those not receiving treatment.

Hazard ratios can be used to:

— Show the relative risk of a complication (like developing a side effect froma drug) in treatment group vs. control group.
— Show whether a treatment shortens an illness duration.
— Show which individuals are more likely to experience an event first.

Interpreting Hazard ratio

HR < 1 treatment reduce risk of event

HR > 1 treatment increase risk of event

HR = 1 no impact

At any time during follow up, patient taking drug A were 0.33 times as likely to die or 67% lower risk of death at any time .

Confidence Interval (CI): is the range of values that is likely to include the true population value and is used to measure the precision of the study’s estimate (in this case, the precision of the Hazard Ratio).

The narrower the confidence interval, the more precise the estimate. (Precision will be affected by the study’s sample size). If the confidence interval includes 1, then the hazard ratio is not significant.

The relative risk ratio tells you that the risk of death is ‘x’ times higher with drug A than with drug B over the entire period of the study (i.e. it’s cumulative).
The hazard ratio tells you that the risk of death is ‘x’’ times higher with drug A than with drug B at any particular point in time.

Forest Plot :

Plotted vertically, a collection of such lines resembles a bunch of trees and is therefore called Forest plot. It displays the hazard ratio as a dot and its confidence interval by bars like error bars.

In age 50–59 the hazard ratio is 0.9 but CI is (0.73 - 1.12). It means in population the treatment group either reduce the risk by 27% or increase by 12%, so it not statistically significant .

So,in Hazard ratio, relative ratio if CI include the value 1, then it not statistically significant . then we could not identify the treatment group is best or worst .

--

--

No responses yet