Hello,
I have a dataset of individuals with a specific disease. The dataset comprises the variable "immigrant" which indicates the immigration status of the individual (immigrant = 0 and 1 for non-immigrant and immigrant status, respectively).
The dataset also includes immigrant-specific variables, that are the one with values only for the immigrants. For example, "immigration category" (imm_cat) is a variable indicating the immigration class of the individual. Obviously, the non-immigrants will not have any value for immigrant-specific variables.
The dataset has missing data in all of the variables, including the "immigrant" variable, which I will then use as an stratification for reporting the annual rates. The immigrant-specific variables also have missing data.
The missing data patterns is as follow:
X | X | X | X | X | X | X | 3550 | 85.87 | 37.783380 |
X | X | X | X | . | X | X | 61 | 1.48 | 34.393443 |
X | X | X | X | . | . | . | 8 | 0.19 | 30.000000 |
X | X | X | . | . | . | . | 503 | 12.17 | 34.242823 |
X | X | . | . | . | . | . | 12 | 0.29 | . |
Now, I am aiming to run the proc mi and do MICE for imputing the above variables in the dataset. I will indeed need to put a condition for the multiple imputation so that SAS will not impute the immigrant-specific variables for non-immigrants. The issue is that the "immigrant" variable itself has missing data. So, SAS should impute the immigrant-specific variables for observed immigrants as well as the imputed immigrants. I could not find any function in the proc mi allowing me to put condition on the imputated cells. My question is that "what is the best approach to address this?"
I am copying below the basic proc mi I aimed initially to use:
proc mi data = dt nimpute 25 out=test25 seed =54321 minimum = . . 0 . . . . . maximum = . . 93 . . . . . ;
class rss_new sex immigrant imm_cat hiv_baseline hiv;
var rss_new sex age immigrant imm_cat hiv_baseline hiv
fcs discrim (immigrant imm_cat hiv_baseline hiv / classeffects=include details) regpmm( age / details) nbiter = 1000;
fcs plots = trace(mean std);
run;
The problem with the above lines of codes is that they impute all the variables in the fcs statement without putting any condition (e.g., impute the immigrant-specific variables for only immigrant individuals). So, at the end of the imputation, for example, I am having some imputed values of imm_cat for some non-immigrant individuals.
Can anyone help me address this need, please?
Thanks
Unfortunately there is not an option within Proc MI to subset the imputation models based on a group variable that also has missing values. There isn't necessarily a good way to handle your situation based on this and I do not know of any references that discuss how it should be approached.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.