Multiple imputation of post-stratification variables (proc mi)

homipilot · Posted 05-01-2024 01:05 PM

Hello,

I have a dataset of individuals with a specific disease. The dataset comprises the variable "immigrant" which indicates the immigration status of the individual (immigrant = 0 and 1 for non-immigrant and immigrant status, respectively).

The dataset also includes immigrant-specific variables, that are the one with values only for the immigrants. For example, "immigration category" (imm_cat) is a variable indicating the immigration class of the individual. Obviously, the non-immigrants will not have any value for immigrant-specific variables.

The dataset has missing data in all of the variables, including the "immigrant" variable, which I will then use as an stratification for reporting the annual rates. The immigrant-specific variables also have missing data.

The missing data patterns is as follow:

Missing Data Patterns

rss_new sex age immigrant imm_cat hiv_baseline hiv

Freq Percent age mean

X	X	X	X	X	X	X	3550	85.87	37.783380
X	X	X	X	.	X	X	61	1.48	34.393443
X	X	X	X	.	.	.	8	0.19	30.000000
X	X	X	.	.	.	.	503	12.17	34.242823
X	X	.	.	.	.	.	12	0.29	.

Now, I am aiming to run the proc mi and do MICE for imputing the above variables in the dataset. I will indeed need to put a condition for the multiple imputation so that SAS will not impute the immigrant-specific variables for non-immigrants. The issue is that the "immigrant" variable itself has missing data. So, SAS should impute the immigrant-specific variables for observed immigrants as well as the imputed immigrants. I could not find any function in the proc mi allowing me to put condition on the imputated cells. My question is that "what is the best approach to address this?"

I am copying below the basic proc mi I aimed initially to use:

proc mi data = dt nimpute 25 out=test25 seed =54321 minimum = . . 0 . . . . . maximum = . . 93 . . . . . ;

class rss_new sex immigrant imm_cat hiv_baseline hiv;

var rss_new sex age immigrant imm_cat hiv_baseline hiv

fcs discrim (immigrant imm_cat hiv_baseline hiv / classeffects=include details) regpmm( age / details) nbiter = 1000;

fcs plots = trace(mean std);

run;

The problem with the above lines of codes is that they impute all the variables in the fcs statement without putting any condition (e.g., impute the immigrant-specific variables for only immigrant individuals). So, at the end of the imputation, for example, I am having some imputed values of imm_cat for some non-immigrant individuals.

Can anyone help me address this need, please?

Thanks

SAS_Rob · Posted 05-02-2024 02:18 PM

Unfortunately there is not an option within Proc MI to subset the imputation models based on a group variable that also has missing values. There isn't necessarily a good way to handle your situation based on this and I do not know of any references that discuss how it should be approached.

Multiple imputation of post-stratification variables (proc mi)

Re: Multiple imputation of post-stratification variables (proc mi)

Catch up on SAS Innovate 2026