2.4K Views

September 08, 22

スライド概要

IEA-WP & JEA Joint Seminar 2022 @online | March 26, 2022

Assoc prof at Tokyo University of Science. PhD in health sciences/MPH at the University of Tokyo. Causal inference in epidemiology/biostatistics.

Tomohiro Shinozaki
19.6K

Tomohiro Shinozaki
12.4K

1.

March 26, 2022 @virtual IEA-WP & JEA Joint Seminar 2022 How to treat missing data―Missing data? What should we do? Classifying structures of possible biases in naïve analyses with missing data Tomohiro Shinozaki, PhD, MPH Tokyo University of Science shinozaki@rs.tus.ac.jp

2.

Missing data is ubiquitous • And we all know that missing data can cause problems • But what problems? • Bias • Inefficiency (= large standard errors) • My talk focuses on bias • Suppose a study comparing exposed vs. unexposed • We have exposure, outcome, and confounder variables • Missing may occur in any of the variables 2

3.

Example: Treatment for thrombosis • Blood clots block an artery of patients • Need for “thinning” or “dissolving” blood clots • Antithrombotic therapy • = anticoagulation (eg. warfarin) + antiplatelet (eg. aspirin) • Reduces thrombotic events but may cause bleeding events Sunaga et al. Circulation Report 2020 3

4.

Example: Treatment for thrombosis • Registry-based retrospective analysis • Exposure: aggressive vs. standard therapy • Outcome: thrombotic & bleeding events AF Patients who underwent TEE during 2010–2012 n = 3,139 Start anticoagulation after LAT n = 82 Anticoagulation + Antiplatelet n = 31 Anticoagulation only n = 51 Thrombotic/ Bleeding events Ave. F/U：878 days LAT: left atrial thrombi AF: atrial fibrillation TEE: transesophageal echocardiography Sunaga et al. Circulation Report 2020 4

5.

Results • Bleeding (left) and thrombotic (right) events Anticoagulation + Antiplatelet Anticoagulation only Missing in ~20 patients • We want to adjust for confounders (among 82 patients) • Height, weight, hypertension, diabetes, stroke risk score (CHA2DS2-VASc), prior time in therapeutic range, prior drug use,… • What should we do? Sunaga et al. Circulation Report 2020 5

6.

How to treat missing data? • Missing in exposure X • Many studies exclude patients from analysis because of ineligibility • Missing in confounders C • • • • Exclude patients from analysis Omit variables Adjust for a “missing indicator” R Impute missing values • Missing in outcome Y • Exclude patients from analysis • Impute missing values 6

7.

Complete case/record analysis • Exclude patients with missing data in any variable • Necessarily reduces a sample size • Before stepping in… • Most textbooks teach the “missing data mechanism” • MCAR, MAR, or NMAR 7

8.

Missing data mechanism • MCAR: missing completely at random • Missingness does not depend on anything • MAR: missing at random • Missingness does not depend on its value within strata of fully observed variables • NMAR: not missing at random • Missingness still depends on the value itself even within strata of observed variables Lee, et al., JCE 2021 8

9.

Missing data mechanism • Not much relevant to understand bias due to missing data • Instead, causal diagram is useful for understanding when and how biases arise • Complete case analysis • Valid under MCAR • But does not necessarily require MCAR • Don’t mistake sufficient & necessary conditions! • Sometimes valid even under MAR/NMAR 9

10.

Bias in complete case analysis (1) • Bias in exposure coefficient of regression models (Unbiased under no exposure effect) • 高井他 Hughes et al., AJE 2019 • Depends on whether missingness is related to outcome variable 10

11.

Bias in complete case analysis(2) • Bias in each coefficient in regression models (outcome) (eg. exposure) (eg. confounder) Carpenter & Smuk, Biometrical J 2020 • Again, biased when missingness is related to outcome variable 11

12.

Causal diagram • Missingness indicator R • takes 1 if any of the X, C, or Y has missing values • takes 0 if all variables are observed • Complete case analysis selects patients with R = 0 • □ means “selection of data” instead of a whole dataset C X Y Confounder Exposure Outcome R =0 12

13.

“Rules” in causal diagrams 1. Arrow (→) means causal association • We are interested in “X → Y ” 2. A and B would be associated when a. A ← C → B b. A → C ← B 3. A and B would not be associated through a. A ← C → B b. A → C ← B C X Y R =0 13

14.

MCAR • Missingness R does not depend on anything • R = 0 does not induce any X-Y association other than “ X → Y ” • Complete case analysis provides unbiased estimates • We’ve known this result C X Y R =0 14

15.

What about this? • Missingness R is dependent on X and/or C • R = 0 does not induce X-Y association other than “X → Y ” • Complete case analysis provides an unbiased estimate • Even under MAR or NMAR, depending on which variable is missing C X Y R =0 15

16.

Missing associated with Y • Missingness R is dependent on Y • Missingness R is dependent on Y, as well as X or C (or both) •“X→Y→R=0←Y” • “Rule” 2b • Complete case analysis provides biased estimates • Regardless of MAR or NMAR C X Y R =0 16

17.

Odds ratio of X and Y • Symmetric about X and Y P(Y = 1| X = 1, C = c) P(Y = 1| X = 0, C = c) P(X = 1| Y = 1, C = c) P(X = 1| Y = 0, C = c) � = � P(Y = 0| X = 1, C = c) P(Y = 0| X = 0, C = c) P(X = 0| Y = 1, C = c) P(X = 0| Y = 0, C = c) Obtained by logistic regression for X = 1 Obtained by logistic regression for Y = 1 • Unbiased if … C X Y R is independent of Y R =0 C X Y R is independent of X R =0 17

18.

Complete case analysis of logistic regression • Unbiased under wide situation White & Carlin, SIM 2010; Bartlett et al., AJE 2015 Hughes et al., AJE 2019 Carpenter & Smuk, Biometrical J 2020 18

19.

Bias in compete case analysis • Understood as selection bias Hernán et al., Epidemiology 2004 • Odds ratio is further robust to the bias • Remember we can directly estimate odds ratios in case-control studies • Technical note • We cannot use “back-door criterion” for formal causal diagrams • Not applicable when stratifying on the effect of exposure X (ie. missingness R) • Back-door criterion can be modified Daniel, et al., SMMR 2011 19

20.

How to treat missing data? • Missing in exposure X • Many studies exclude patients from analysis because of ineligibility • Missing in confounders C • • • • Exclude patients from analysis Omit variables Adjust for a “missing indicator” R Impute missing values • Missing in outcome Y • Exclude patients from analysis • Impute missing values 20

21.

Selection of adjusted variables • If we ignore the missing variables, others are fully observed • Remember the thrombosis study Height, weight, hypertension, diabetes, stroke risk score (CHA2DS2-VASc), prior time in therapeutic range, prior drug use,… • But, of course, confounding bias remains C1 (with missing) C2 X Y RC1 21

22.

How to treat missing data? • Missing in exposure X • Many studies exclude patients from analysis because of ineligibility • Missing in confounders C • • • • Exclude patients from analysis Omit variables Adjust for a “missing indicator” R Impute missing values • Missing in outcome Y • Exclude patients from analysis • Impute missing values 22

23.

Missing indicators • Missing indicators RC1, RC2,… for confounders C1, C2,… • RCk = 1 if Ck is missing • RCk = 0 if Ck is observed • Instead of Ck, adjust for (1 – RCk)Ck and RCk • Can be defined for all patients irrespective of missing in Ck • Equivalent to setting a missing category for categorical Ck • Eg. Severe HT / Moderate HT / Normal BP / Optimal BP / Unknown HT: hypertension BP: blood pressure 23

24.

Missing category adjustment • Not recommended in general • A missing category includes various confounder values Vach, Blettner, AJE 1991 Greenland & Finkle, AJE 1995 • Let’s depict it in a causal diagram C X Y (1 – RC)C RC 24

25.

Misclassification bias (information bias) • X ← C → Y cannot be blocked due to missing C • Partially blocked by (1 – RC)C as a surrogate for C • Residual confounding C X Y (1 – RC)C RC 25

26.

Misclassification bias (information bias) • Under MCAR (no dashed line) • We cannot predict the direction of bias due to confounder misclassification Greenland & Robins, AJE 1985 • Dashed line(s) creates X-Y association other than “ X → Y ” • Selection bias through “… → RC ← …” C X Y (1 – RC)C RC 26

27.

How to treat missing data? • Missing in exposure X • Many studies exclude patients from analysis because of ineligibility • Missing in confounders C • • • • Exclude patients from analysis Omit variables Adjust for a “missing indicator” R Impute missing values • Missing in outcome Y • Exclude patients from analysis • Impute missing values 27

28.

Imputation methods • Impute a value for missing data to create “pseudo full data” • After the imputation, we can fit a regression model as usual • Single imputation • Typically (not always) underestimates the uncertainty of missing data • Multiple imputations • Substitutes simulated values for a missing variable again and again… • Drs. Sakamaki and Fukui will give a step-by-step illustration of MI 28

29.

Bias in imputation • Imputation methods rely on models for missing data • Under the assumption of missing at random (MAR) • Missingness is only dependent on observed data • Risk at model misspecification bias • Wrong models will predict values far from the truth • If MAR assumption fails, multiple imputations can cause bias larger than compete case analysis 29

30.

Summary • Different structures of biases in different missing data analyses • • • • Selection bias for complete case analysis Confounding bias for omitting missing confounders Misclassification bias for missing indicator adjustment Model misspecification bias for imputation • “Missing data analysis” refers to the choice of the methods based on the trade-off between these biases • Needs data and assumptions 30

31.

Practical consideration • Complete case analysis can be useful as a sensitivity analysis though not to be a primary analysis • But may cause bias if missingness depends on the outcome Lee, et al., JCE 2021 • When multiple variables are missing • Complete case analysis may provide too imprecise estimates • Multiple imputations may be the only option owing to its flexibility • Continues to the second talk… 31

32.

References • Bartlett JW, Harel O, Carpenter JR. Asymptotically unbiased estimation of exposure odds ratios in complete records logistic regression. Am J Epidemiol. 2015;182:730-6. • Carpenter JR, Smuk M. Missing data: a statistical framework for practice. Biom J. 2021;63:915-47. • Daniel RM, Kenward MG, Cousens SN, De Stavola BL. Using causal diagrams to guide analysis in missing data problems. Stat Methods Med Res. 2012;21:243-56. • Greenland S, Finkle WD. A critical look at methods for handling missing covariates in epidemiologic regression analyses. Am J Epidemiol. 1995;142:1255-64. • Greenland S, Robins JM. Confounding and misclassification. Am J Epidemiol. 1985;122:495506. • Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615-25. • Hughes RA, Heron J, Sterne JAC, Tilling K. Accounting for missing data in statistical analyses: multiple imputation is not always the answer. Int J Epidemiol. 2019;48:1294-1304. 32

33.

References • Lee KJ, Tilling KM, Cornish RP, et al. Framework for the treatment and reporting of missing data in observational studies: The Treatment And Reporting of Missing data in Observational Studies framework. J Clin Epidemiol. 2021;134:79-88. • Ross RK, Breskin A, Westreich D. When is a complete-case approach to missing data valid? the importance of effect-measure modification. Am J Epidemiol. 2020;189:1583-89. • Sunaga A, Hikoso S, Nakatani D, et al. Comparison of long-term outcomes between combination antiplatelet and anticoagulant therapy and anticoagulant monotherapy in patients with atrial fibrillation and left atrial thrombi. Circ Rep. 2020;2:457-65. • Vach W, Blettner M. Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. Am J Epidemiol. 1991;134:895-907. • White IR, Carlin JB. Bias and efficiency of multiple imputation compared with completecase analysis for missing covariate values. Stat Med 2010;29:2920-31. 33