View Full Survival Analysis Steps on GitHub
Cox proportional hazards models will be employed to investigate the association between exposure categories and survival outcomes. In this study, we will conduct survival analysis to evaluate the relationships between baseline inflammatory and metabolic exposure categories and the risks of developing non-alcoholic fatty liver disease (NAFLD) and cirrhosis. Survival analysis is a statistical approach designed to analyze time-to-event data, where the survival time \(T\) (time until the occurrence of the event of interest) and the censoring time \(C\) (time at which an observation is no longer tracked) are observed. The observed survival time, denoted as:
\[Y = min(T, C)\]
The censoring indicator \(\delta\) is defined as: \[ \delta = \begin{cases} 1 & \text{if } T \leq C \\ 0 & \text{if } T > C \end{cases} \] Here, \(\delta = 1\) indicates the event was observed, and \(\delta = 0\) signifies that censoring occurred.
By modeling survival times and event occurrences, we aim to quantify the impact of different exposure categories on the hazard of developing NAFLD and cirrhosis, adjusting for relevant covariates.
We applied the Cox proportional hazards model to evaluate how various covariates influence the hazard rate \(h(t)\), which represents the instantaneous risk of the event occurring at time \(t\). The hazard function is defined as:
\[h(t) = h_0(t) \exp(x_1\beta_1 + x_2\beta_2 + \dots + x_k\beta_k)\]
Where:
\(h(t)\): Hazard function for time t,
\(h_0(t)\): Baseline hazard function,
\(x_i\): Covariates (e.g., age, sex, inflammatory markers, socioeconomic factors),
\(\beta_i\): Regression coefficients for covariates.
The baseline characteristics are stratified by each exposure category to identify differences in fundamental features across exposure groups and to provide a reference for constructing the Cox proportional hazards model (see Appendix for more detailed tables and explanations).
Baseline_expo_lmr_cat_results <- read_csv("./csv/Baseline_expo_lmr_cat.csv")
Baseline_expo_lmr_cat_cleaned <- Baseline_expo_lmr_cat_results %>%
select(-test) %>%
rename(
` ` = ...1,
Level1 = `1`,
Level2 = `2`,
Level3 = `3`
)
knitr::kable(
Baseline_expo_lmr_cat_cleaned,
digits = 2,
caption = "Baseline Characteristics by LMR Exposure Categories",
align = "c"
) %>%
kable_styling(full_width = FALSE, position = "center")
Overall | Level1 | Level2 | Level3 | p | |
---|---|---|---|---|---|
n | 413146 | 105227 | 204634 | 103279 | NA |
sex = 2 (%) | 190849 (46.2) | 69149 (65.7) | 93068 (45.5) | 28629 (27.7) | <0.001 |
age (mean (SD)) | 56.54 (8.09) | 57.70 (8.07) | 56.32 (8.10) | 55.79 (7.96) | <0.001 |
income (%) | NA | NA | NA | NA | <0.001 |
1 | 40844 (9.9) | 9912 (9.4) | 20134 (9.9) | 10798 (10.5) | NA |
2 | 17370 (4.2) | 3874 (3.7) | 8290 (4.1) | 5206 (5.1) | NA |
3 | 79972 (19.4) | 21680 (20.7) | 37894 (18.6) | 20398 (19.8) | NA |
4 | 89874 (21.8) | 23863 (22.8) | 44107 (21.6) | 21902 (21.3) | NA |
5 | 92455 (22.4) | 23555 (22.5) | 46634 (22.8) | 22263 (21.6) | NA |
6 | 72283 (17.5) | 17494 (16.7) | 37205 (18.2) | 17583 (17.1) | NA |
7 | 19249 (4.7) | 4513 (4.3) | 9872 (4.8) | 4864 (4.7) | NA |
townsend (mean (SD)) | -1.33 (3.08) | -1.39 (3.04) | -1.39 (3.04) | -1.14 (3.18) | <0.001 |
total_met (mean (SD)) | 1619.08 (2015.70) | 1645.29 (2084.82) | 1625.33 (2008.34) | 1578.85 (1955.34) | <0.001 |
diet_quality (mean (SD)) | 3.29 (1.57) | 3.12 (1.57) | 3.31 (1.57) | 3.42 (1.56) | <0.001 |
sleep_hour (mean (SD)) | 7.15 (1.10) | 7.17 (1.12) | 7.15 (1.09) | 7.14 (1.13) | <0.001 |
smoke_cat (%) | NA | NA | NA | NA | <0.001 |
1 | 32255 (7.8) | 6743 (6.4) | 15030 (7.3) | 10481 (10.1) | NA |
2 | 11334 (2.7) | 2843 (2.7) | 5572 (2.7) | 2919 (2.8) | NA |
3 | 96062 (23.3) | 27859 (26.5) | 47078 (23.0) | 21122 (20.5) | NA |
4 | 47431 (11.5) | 12141 (11.5) | 23827 (11.6) | 11463 (11.1) | NA |
5 | 60211 (14.6) | 15267 (14.5) | 30436 (14.9) | 14507 (14.0) | NA |
6 | 165853 (40.1) | 40374 (38.4) | 82691 (40.4) | 42787 (41.4) | NA |
total_alcohol (mean (SD)) | 12.55 (15.17) | 15.16 (17.37) | 12.50 (14.77) | 9.98 (12.92) | <0.001 |
non_hdl (mean (SD)) | 359.49 (78.74) | 345.15 (77.61) | 361.10 (77.88) | 371.09 (79.37) | <0.001 |
tg (mean (SD)) | 92.17 (22.17) | 92.24 (22.59) | 91.90 (21.60) | 92.66 (22.85) | <0.001 |
bp_cat (mean (SD)) | 49.88 (28.78) | 47.57 (28.26) | 50.05 (28.71) | 51.91 (29.25) | <0.001 |
This table provides the baseline characteristics of participants categorized by Lymphocyte-to-Monocyte Ratio (LMR) levels (Level 1, Level 2, Level 3). It includes demographic (e.g., age, sex), socioeconomic (e.g., income, Townsend index), lifestyle (e.g., smoking, alcohol intake), and clinical markers (e.g., triglycerides, diet quality), highlighting significant differences across categories with p-values for most variables. These differences emphasize potential confounding variables that need adjustment in subsequent statistical models, such as Cox proportional hazards analysis.
\[ h(t) = h_0(t) \exp\left( \beta_{\text{expo}} \times x_{expo} + \beta_{\text{age}} \times x_{age} + \beta_{\text{sex}} \times x_{sex} \right) \]
\[ \begin{aligned} h(t) = h_0(t) \exp\Big( & \beta_{\text{expo}} \times x_{expo} + \beta_{\text{age}} \times x_{age} + \beta_{\text{sex}} \times x_{sex} + \beta_{\text{income}} \times x_{income} \\ & + \beta_{\text{townsend}} \times x_{townsend} + \beta_{\text{total_met}} \times x_{\text{total_met}} + \beta_{\text{diet_quality}} \times x_{\text{diet_quality}} \\ & + \beta_{\text{sleep_hour}} \times x_{\text{sleep_hour}} + \beta_{\text{smoke_cat}} \times x_{\text{smoke_cat}} + \beta_{\text{total_alcohol}} \times x_{\text{total_alcohol}} \\ & + \beta_{\text{no_hdl}} \times x_{\text{no_hdl}} + \beta_{\text{tg}} \times x_{\text{tg}} + \beta_{\text{bp_cat}} \times x_{\text{bp_cat}} \Big) \end{aligned} \]
After fitting the Cox proportional hazards models, hazard ratios will be calculated for each exposure category, with the lowest exposure category serving as the reference. Statistical significance will be assessed using p-values, and dose-response relationships will be evaluated through p-for-trend tests. Results will be reported with 95% confidence intervals to convey precision. Survival durations will be computed based on either censored times or event times, with imputation applied where necessary to ensure complete datasets for analysis.
Figure 1 demonstrates the association between lymphocyte-to-monocyte ratio (LMR) exposure categories and the risk of developing non-alcoholic fatty liver disease (NAFLD). The high exposure group (expo_lmr_cat=3, green line) exhibits the highest cumulative hazard throughout the follow-up period, followed by the intermediate (expo_lmr_cat=2, blue line) and low exposure (expo_lmr_cat=1, yellow line) groups. These differences are statistically significant (p < 0.0001), indicating that higher LMR levels are associated with an increased risk of NAFLD. This suggests that elevated LMR, potentially indicative of an immune imbalance, may contribute to the progression of NAFLD.
Expanding on the findings for NAFLD, Figure 2 evaluates the relationship between LMR exposure categories and the risk of developing cirrhosis. The high exposure group (expo_lmr_cat=3, green line) again demonstrates the highest cumulative hazard, followed by the intermediate (expo_lmr_cat=2, blue line) and low exposure (expo_lmr_cat=1, yellow line) groups. Although the separation of the curves becomes more evident after approximately 5,000 days, the differences are not statistically significant (p = 0.059). These findings suggest that higher LMR levels may also be associated with an increased risk of cirrhosis, though the evidence is weaker compared to NAFLD.
Building on the role of inflammatory markers, Figure 3 illustrates the relationship between systemic immune-inflammation index (SII) exposure categories and the risk of developing NAFLD. The low exposure group (expo_sii_cat=1, yellow line) exhibits the highest cumulative hazard, followed by the intermediate group (expo_sii_cat=2, blue line), while the high exposure group (expo_sii_cat=3, green line) shows the lowest cumulative hazard. These differences are statistically significant (p < 0.0001), indicating that higher SII levels are associated with a reduced risk of NAFLD. This finding highlights the complex role of systemic inflammation, where individuals with lower SII levels may have an elevated risk of NAFLD.
Expanding the analysis to cirrhosis, Figure 4 examines the association between SII exposure categories and cirrhosis risk. Similar to the trend observed in NAFLD, the low exposure group (expo_sii_cat=1, yellow line) shows the highest cumulative hazard, followed by the intermediate group (expo_sii_cat=2, blue line), while the high exposure group (expo_sii_cat=3, green line) consistently demonstrates the lowest cumulative hazard. These differences are statistically significant (p < 0.0001), suggesting that higher SII levels are associated with a reduced risk of cirrhosis. This pattern indicates that systemic inflammation, as reflected by lower SII levels, may play a role in cirrhosis development.
Further exploring inflammatory markers, Figure 5 illustrates the relationship between Neutrophil-Percentage-to-Albumin Ratio (NPAR) exposure categories and NAFLD risk. The low exposure group (expo_npar_cat=1, yellow line) exhibits the highest cumulative hazard, followed by the intermediate group (expo_npar_cat=2, blue line), while the high exposure group (expo_npar_cat=3, green line) demonstrates the lowest cumulative hazard. These differences are statistically significant (p < 0.0001), indicating that higher NPAR levels are associated with a reduced risk of NAFLD. These findings emphasize the potential protective role of higher NPAR levels against NAFLD progression.
Finally, Figure 6 explores the relationship between NPAR exposure categories and the risk of cirrhosis. The low exposure group (expo_npar_cat=1, yellow line) shows the highest cumulative hazard, followed by the intermediate group (expo_npar_cat=2, blue line), while the high exposure group (expo_npar_cat=3, green line) consistently exhibits the lowest cumulative hazard. These differences are statistically significant (p < 0.0001), suggesting that higher NPAR levels are associated with a reduced risk of cirrhosis. This further supports the protective role of higher NPAR levels in mitigating cirrhosis risk, potentially through modulation of systemic inflammation.
nafld_results <- read_csv("./csv/Main_cox_nafld.csv")
nafld_results_cleaned <- nafld_results %>%
select(-1, -lower_conf, -upper_conf) %>%
rename(
Model = Model,
`Exposure Category` = Category,
`Hazard Ratio (HR)` = HR,
`P-value` = P,
`P for Trend` = `P for trend`,
`Confidence Interval` = CI
) %>%
slice(-c(n(), n() - 2))
nafld_results_cleaned %>%
knitr::kable(
digits = 4,
caption = "Cox Proportional Hazards Results for NAFLD Outcome",
align = "c"
) %>%
kable_styling(full_width = FALSE, position = "center")
Model | Exposure Category | Hazard Ratio (HR) | Confidence Interval | P-value | P for Trend |
---|---|---|---|---|---|
Model 1 | expo_lmr_cat2 | 1.068 | 1.022, 1.115 | 0.0033 | 0.0033 |
Model 1 | expo_lmr_cat3 | 1.105 | 1.049, 1.164 | 0.0002 | NA |
Model 2 | expo_lmr_cat2 | 1.133 | 1.059, 1.213 | 0.0003 | 0.0003 |
Model 2 | expo_lmr_cat3 | 1.122 | 1.033, 1.22 | 0.0065 | NA |
Model 1 | expo_sii_cat2 | 0.824 | 0.789, 0.861 | 0.0000 | 0.0000 |
Model 1 | expo_sii_cat3 | 0.744 | 0.708, 0.782 | 0.0000 | NA |
Model 2 | expo_sii_cat2 | 0.851 | 0.795, 0.91 | 0.0000 | 0.0000 |
Model 2 | expo_sii_cat3 | 0.712 | 0.658, 0.769 | 0.0000 | NA |
Model 1 | expo_npar_cat2 | 0.873 | 0.835, 0.913 | 0.0000 | 0.0000 |
Model 1 | expo_npar_cat3 | 0.810 | 0.77, 0.851 | 0.0000 | NA |
Model 2 | expo_npar_cat2 | 0.864 | 0.806, 0.926 | 0.0000 | 0.0000 |
Model 2 | expo_npar_cat3 | 0.766 | 0.708, 0.829 | 0.0000 | NA |
Model 1 | expo_nps_cat2 | 0.840 | 0.795, 0.889 | 0.0000 | 0.0000 |
Model 2 | expo_nps_cat2 | 0.819 | 0.751, 0.893 | 0.0000 | 0.0000 |
This table presents the results of Cox proportional hazards models
evaluating the association between various baseline exposure categories
and the risk of developing non-alcoholic fatty liver disease. Each
exposure variable (expo_lmr
, expo_sii
,
expo_npar
, expo_nps
) is divided into three
categories, with Category 1 (not shown) serving as the reference group
(HR = 1.00). The hazard ratios (HRs), confidence intervals (CIs),
p-values, and p for trend values are provided for two models: Model 1,
adjusted for age and sex, and Model 2, adjusted for additional
covariates including socioeconomic and lifestyle factors.
For expo_lmr
, higher exposure categories (cat2 and cat3)
consistently show increased HRs across both models, with slightly higher
HRs in Model 2, indicating a stronger association with NAFLD risk after
full adjustment. In contrast, for expo_sii
,
expo_npar
, and expo_nps
, higher categories
exhibit HRs less than 1 in both models, reflecting their protective
effects against NAFLD. Notably, the protective effect strengthens
slightly in Model 2 for expo_sii
and
expo_npar
, suggesting that adjusting for additional factors
further clarifies their relationship with NAFLD risk. Significant p for
trend values across most exposures reinforce robust dose-response
relationships, highlighting the importance of these markers in
predicting NAFLD.
cirrhosis_results <- read_csv("./csv/Main_cox_cirrhosis.csv")
cirrhosis_results_cleaned <- cirrhosis_results %>%
select(-1, -lower_conf, -upper_conf) %>%
rename(
Model = Model,
`Exposure Category` = Category,
`Hazard Ratio (HR)` = HR,
`P-value` = P,
`P for Trend` = `P for trend`,
`Confidence Interval` = CI
) %>%
slice(-c(n(), n() - 2))
cirrhosis_results_cleaned %>%
knitr::kable(
digits = 4,
caption = "Cox Proportional Hazards Results for Cirrhosis Outcome",
align = "c"
) %>%
kable_styling(full_width = FALSE, position = "center")
Model | Exposure Category | Hazard Ratio (HR) | Confidence Interval | P-value | P for Trend |
---|---|---|---|---|---|
Model 1 | expo_lmr_cat2 | 0.884 | 0.796, 0.981 | 0.0209 | 0.0209 |
Model 1 | expo_lmr_cat3 | 0.859 | 0.751, 0.982 | 0.0261 | NA |
Model 2 | expo_lmr_cat2 | 0.964 | 0.815, 1.14 | 0.6682 | 0.6682 |
Model 2 | expo_lmr_cat3 | 0.829 | 0.665, 1.033 | 0.0955 | NA |
Model 1 | expo_sii_cat2 | 0.518 | 0.466, 0.574 | 0.0000 | 0.0000 |
Model 1 | expo_sii_cat3 | 0.451 | 0.399, 0.509 | 0.0000 | NA |
Model 2 | expo_sii_cat2 | 0.527 | 0.447, 0.622 | 0.0000 | 0.0000 |
Model 2 | expo_sii_cat3 | 0.387 | 0.316, 0.474 | 0.0000 | NA |
Model 1 | expo_npar_cat2 | 0.791 | 0.705, 0.887 | 0.0001 | 0.0001 |
Model 1 | expo_npar_cat3 | 0.795 | 0.702, 0.9 | 0.0003 | NA |
Model 2 | expo_npar_cat2 | 0.746 | 0.619, 0.899 | 0.0020 | 0.0020 |
Model 2 | expo_npar_cat3 | 0.747 | 0.611, 0.915 | 0.0048 | NA |
Model 1 | expo_nps_cat2 | 0.778 | 0.676, 0.895 | 0.0004 | 0.0004 |
Model 2 | expo_nps_cat2 | 0.722 | 0.575, 0.906 | 0.0049 | 0.0049 |
Similarly, this table presents the results of Cox proportional
hazards models evaluating the association between various baseline
exposure categories and the risk of developing cirrhosis. For
expo_lmr
, higher categories (cat2 and cat3) show HRs
slightly below 1 in Model 1, indicating a modest protective effect
against cirrhosis, but this effect weakens in Model 2, with HRs
approaching 1 and becoming non-significant in some comparisons. In
contrast, expo_sii
demonstrates a strong protective
association in both models, with substantially reduced HRs for cat2 and
cat3, and slightly stronger effects observed in Model 2 after full
adjustment. Similarly, expo_npar
and expo_nps
consistently show HRs below 1 in both models, with Model 2 indicating
slightly enhanced protective effects compared to Model 1. The
significant p for trend values, particularly for expo_sii
and expo_nps
, highlight robust dose-response relationships,
emphasizing the importance of these markers in mitigating cirrhosis
risk.