Mediation analysis with duration dataΒΆ

This notebook demonstrates mediation analysis when the mediator and outcome are duration variables, modeled using proportional hazards regression. These examples are based on simulated data.

[ ]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.stats.mediation import Mediation

Make the notebook reproducible.

[ ]:
np.random.seed(3424)

Specify a sample size.

[ ]:
n = 1000

Generate an exposure variable.

[ ]:
exp = np.random.normal(size=n)

Generate a mediator variable.

[ ]:
def gen_mediator():
    mn = np.exp(exp)
    mtime0 = -mn * np.log(np.random.uniform(size=n))
    ctime = -2 * mn * np.log(np.random.uniform(size=n))
    mstatus = (ctime >= mtime0).astype(int)
    mtime = np.where(mtime0 <= ctime, mtime0, ctime)
    return mtime0, mtime, mstatus

Generate an outcome variable.

[ ]:
def gen_outcome(otype, mtime0):
    if otype == "full":
        lp = 0.5 * mtime0
    elif otype == "no":
        lp = exp
    else:
        lp = exp + mtime0
    mn = np.exp(-lp)
    ytime0 = -mn * np.log(np.random.uniform(size=n))
    ctime = -2 * mn * np.log(np.random.uniform(size=n))
    ystatus = (ctime >= ytime0).astype(int)
    ytime = np.where(ytime0 <= ctime, ytime0, ctime)
    return ytime, ystatus

Build a dataframe containing all the relevant variables.

[ ]:
def build_df(ytime, ystatus, mtime0, mtime, mstatus):
    df = pd.DataFrame(
        {
            "ytime": ytime,
            "ystatus": ystatus,
            "mtime": mtime,
            "mstatus": mstatus,
            "exp": exp,
        }
    )
    return df

Run the full simulation and analysis, under a particular population structure of mediation.

[ ]:
def run(otype):

    mtime0, mtime, mstatus = gen_mediator()
    ytime, ystatus = gen_outcome(otype, mtime0)
    df = build_df(ytime, ystatus, mtime0, mtime, mstatus)

    outcome_model = sm.PHReg.from_formula(
        "ytime ~ exp + mtime", status="ystatus", data=df
    )
    mediator_model = sm.PHReg.from_formula("mtime ~ exp", status="mstatus", data=df)

    med = Mediation(
        outcome_model,
        mediator_model,
        "exp",
        "mtime",
        outcome_predict_kwargs={"pred_only": True},
    )
    med_result = med.fit(n_rep=20)
    print(med_result.summary())

Run the example with full mediation

[ ]:
run("full")

Run the example with partial mediation

[ ]:
run("partial")

Run the example with no mediation

[ ]:
run("no")