BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Leanne
Calcite | Level 5

Hi,

I am working with a data set that contains many variables that I want to use to impute missing values (arbitrary missing pattern), most of which have missing values. The majority of my variables are discrete (some nominal and some ordinal). I know I need to use the FCS dscrim and logistic functions within proc mi, but I am unsure of the syntax when imputing multiple variables, some of which are also are being imputed. Since the FCS function is so new I am having trouble finding examples of code online. I was wondering if anyone has experience using this command and could give me some advice.

Here are three simplified sample codes, one of which may be close to correct:

note on variables: bmi is continuous, mamgp3 is ordinal, agegp is ordinal (no missing values), treat is nominal, and rhistogp is nominal, the variables are in the var statement in order of least missing values to most missing values.

Format 1:

PROC MI DATA=complete OUT=complete_mi NIMPUTE=10 SEED=9455;

CLASS mamgp agegp treat rhistogp;

FCS logistic(agegp) reg(bmi) logistic(mamgp) discrim(treat) discrim(rhistogp)  ;

VAR agegp bmi mamgp treat rhistogp;

RUN;

Format 2

PROC MI DATA=complete OUT=complete_mi NIMPUTE=10 SEED=9455;

CLASS mamgp agegp treat rhistogp;

FCS reg(bmi = agegp mamgp treat rhistogp) ;

FCS logistic(agegp = mamgp bmi treat rhistogp) ;

FCS logistic(mamgp = agegp bmi treat rhistogp) ;

FCS discrim(treat = agegp bmi mamgp rhistogp) ;

FCS discrim(rhistogp = agegp bmi mamgp treat)  ;

VAR agegp bmi mamgp treat rhistogp;

RUN;

Format 3: For this one I turned the nominal variables into binary dummy variables so that I could include them in the fcs logistic (does that make sense?)

  PROC MI DATA=complete OUT=complete_mi NIMPUTE=10 SEED=9455;

CLASS mamgp agegp treat1 treat2 treat3 rhistogp1 rhistogp2;

FCS reg(bmi = agegp mamgp treat1 treat2 treat3 rhistogp1 rhistogp2) ;

FCS logistic(mamgp = agegp bmi treat1 treat2 treat3 rhistogp1 rhistogp2) ;

FCS logistic(agegp = mamgp bmi treat1 treat2 treat3 rhistogp1 rhistogp2) ;

FCS logistic(treat1 = agegp bmi mamgp treat2 treat3 rhistogp1 rhistogp2) ;

FCS logistic(treat2 = agegp bmi mamgp treat1 treat3 rhistogp1 rhistogp2) ;

FCS logistic(treat3 = agegp bmi mamgp treat1 treat2 rhistogp1 rhistogp2) ;

FCS logistic(rhistogp1 = agegp bmi mamgp treat1 treat2 treat3 rhistogp2)  ;

FCS logistic(rhistogp2 = agegp bmi mamgp treat1 treat2 treat3 rhistogp1)  ;

VAR agegp bmi mamgp treat1 treat2 treat3 rhistogp1 rhistogp2;

RUN;

When I run "Format 1" I get these warnings in the log:

WARNING: The covariates are not specified in an FCS discriminant method for variable treat, only

         remaining continuous variables will be used as covariates.

WARNING: The covariates are not specified in an FCS discriminant method for variable rhistogp,

         only remaining continuous variables will be used as covariates.

When I *try* to run "Format 2" I get this error in the log:

ERROR: The class variables cannot be used as covariates in an FCS discriminant method.

When I ran "Format 3" I got approximately 800 warnings for each of the dummy variables, example:

WARNING: The maximum likelihood estimates for the logistic regression with observed observations

         may not exist for variable rhistogp2. The posterior predictive distribution of the

         parameters used in the imputation process is based on the maximum likelihood estimates

         in the last maximum likelihood iteration.

Does Format 1 take the other class variables into account when I impute a class variable does it take into account class variables when imputing the continuous variable? If not, is there a way to do this?

Thank you! Let me know if you require additional information.

Message was edited by: Leanne Shulman

1 ACCEPTED SOLUTION

Accepted Solutions
Leanne
Calcite | Level 5

Just in case others have the same question.

I contacted SAS tech support and this was the reply:

In SAS 9.3TS1M0 the discriminant method will only use continuous variables as predictors (that is why format 2 generates an ERROR).  In the latest release of SAS/STAT 12.1, which is part of SAS 9.3TS1M2, you can specify that it also use CLASS variables as well with the  CLASSEFFECTS=INCLUDE option.

That being said, you would not want to take the approach in method 3 of creating dummy variables.  You will likely encounter convergence issues in the logistic models (as you are seeing) and even if they did run there is no way to guarantee that an observation won't be predicted as a value of 1 for more than one of the dummy variables.

Method 1 is really the same thing as:

PROC MI DATA=complete OUT=complete_mi NIMPUTE=10 SEED=9455;

CLASS mamgp agegp treat rhistogp;

FCS logistic(agegp=bmi mamgp treat rhistogp);

FCS reg(bmi=agegp mamgp treat rhistogp);

FCS logistic(mamgp=bmi agegp treat rhistogp);

FCS discrim(treat=bmi);

FCS discrim(rhistogp=bmi)  ;

VAR agegp bmi mamgp treat rhistogp;

RUN;

It uses all the variables for REG and LOGISTIC as predictors and the continuous ones for DISCRIM.

View solution in original post

9 REPLIES 9
Leanne
Calcite | Level 5

Just in case others have the same question.

I contacted SAS tech support and this was the reply:

In SAS 9.3TS1M0 the discriminant method will only use continuous variables as predictors (that is why format 2 generates an ERROR).  In the latest release of SAS/STAT 12.1, which is part of SAS 9.3TS1M2, you can specify that it also use CLASS variables as well with the  CLASSEFFECTS=INCLUDE option.

That being said, you would not want to take the approach in method 3 of creating dummy variables.  You will likely encounter convergence issues in the logistic models (as you are seeing) and even if they did run there is no way to guarantee that an observation won't be predicted as a value of 1 for more than one of the dummy variables.

Method 1 is really the same thing as:

PROC MI DATA=complete OUT=complete_mi NIMPUTE=10 SEED=9455;

CLASS mamgp agegp treat rhistogp;

FCS logistic(agegp=bmi mamgp treat rhistogp);

FCS reg(bmi=agegp mamgp treat rhistogp);

FCS logistic(mamgp=bmi agegp treat rhistogp);

FCS discrim(treat=bmi);

FCS discrim(rhistogp=bmi)  ;

VAR agegp bmi mamgp treat rhistogp;

RUN;

It uses all the variables for REG and LOGISTIC as predictors and the continuous ones for DISCRIM.

SAShh
Calcite | Level 5

thanks, Leanne!

Deep81
Calcite | Level 5

Hello,

I have a question related to the topic of this post. I am hoping that you can help me out.

I am trying to impute missing values in a dataset that has mostly categorical variables (binary, ordinal or nominal). I am using the discriminant FCS method for each of the variables.

Now, I am trying to impute my socio-demographic variables with missing values (age, education, income, marital status, race, ethnicity) using the other socio-demographic variables only. There are some other psychosocial variables [strs_fm3 and strs_dv3]; to impute those which I am trying to use both the socio-demographic, as well as other psychosocial variables. I am doing this in SAS 9.4 and here is an example of the code I am using (after multiple trials and errors):

proc mi data= prams_diss seed=1305417 out=prams_diss_imp_may28 nimpute=10;

class mat_age_naphsis mat_race hispanic mat_ed married income5 strs_fm3 strs_dv3;

FCS discrim(mat_age_naphsis=mat_race hispanic mat_ed married income5 /classeffects=include);

FCS discrim(mat_race=mat_age_naphsis hispanic mat_ed married income5 / classeffects=include);

FCS discrim(hispanic= mat_race mat_age_naphsis mat_ed married income5 /classeffects=include);

FCS discrim(mat_ed=mat_age_naphsis mat_race hispanic married income5 / classeffects=include);

FCS discrim(married=mat_age_naphsis mat_race hispanic mat_ed income5 / classeffects=include);

FCS discrim(income5=mat_age_naphsis mat_race hispanic married mat_ed / classeffects=include);

FCS discrim (strs_fm3= mat_age_naphsis mat_race hispanic mat_ed married income5 strs_dv3/ classeffects=include);

FCS discrim (strs_dv3= mat_age_naphsis mat_race hispanic mat_ed married income5 strs_fm3/ classeffects=include);

var mat_age_naphsis married mat_ed mat_race hispanic income5 strs_dv3 strs_fm3;

run;


The model has run successfully without any errors. At the end, a dataset with 10 times the no. of observations as in the original file as been created. In the output file, I have only the information about no. of variables and the table (the big one with crosses) with missing value patterns.


I just wanted to make sure whether the imputation has been done exactly how I wanted. That is, only other socio-demographic variables are used to impute each of the socio-demographics. And for the stress variables, information from both the socio-demographic, plus, the other stress variable.


Do you think (based on my code), that (using the specific variables to impute specific variables) is what I have been able to do?
Or, has SAS used all the variables specified in the var and class statements to impute each variable? Is there any way to know what is going on?


Thanks a lot.


Deep

Rick_SAS
SAS Super FREQ

I'm not an expert on MI, but I think there is an easy way for you to answer your question empirically.  Run a test in which you use one of the FCS statements AND ALSO use a KEEP statement on the DATA= option so that the only variables in the input data are the ones that you are trying to use for the imputation. Use NIMPUTE=1.  Does the output of the test match the original output?  If so, repeat for the other FCS statements.

For example, for the first FCS statement, you could run

proc mi data=prams_diss seed=1305417 out=prams_diss_imp_may28 nimpute=1;

keep mat_age_naphsis=mat_race hispanic mat_ed married income5;

class mat_age_naphsis mat_race hispanic mat_ed married income5;

FCS discrim(mat_age_naphsis=mat_race hispanic mat_ed married income5 /classeffects=include);

var mat_age_naphsis mat_race hispanic mat_ed  married income5;

run;

The results of this proc call can only depend on the variables in the KEEP statement.  Now rerun the analysis, but this time omit the KEEP statement and put ALL variables in the CLASS and VAR statements. Do you get the same answers?

Don't know if this will work or not, but it might clarify what is happening.

Deep81
Calcite | Level 5

Thanks, Rick. I think your suggestion makes sense and I am going to try that.

Fabie10
Calcite | Level 5

Hello,

I work on SAS 9.4.

I have to impute categorical variables ( Miss_Smoking Miss_Alcohol Miss_WorkStatus) with arbitrary missing value patterns.
I only have one continuous (complete) variable in my dataset: 'age'.

I tried this code:

PROC MI DATA = missing  out=FCS nimpute=5 seed=22222;

Class nationality relationStatus Education Region ChronicDisease Miss_Smoking Miss_Alcohol Miss_WorkStatus;

Fcs nbiter=40  logistic (Miss_Smoking Miss_Alcohol Miss_WorkStatus/details);

Var age nationality relationStatus Education Region ChronicDisease Miss_Smoking Miss_Alcohol Miss_WorkStatus; run;

The imputation works, generating a table with imputed data, but I get the following warning:

WARNING: The covariates are not specified in an FCS discriminant method for variable nationality, only remaining

         continuous variables will be used as covariates with the default CLASSEFFECTS=EXCLUDE option.

...

So I used the 'FCS discrim' function:

PROC MI DATA = missing  out=FCS2 nimpute=5 seed=22222;

Class nationality relationStatus Education Region ChronicDisease Miss_Smoking Miss_Alcohol Miss_WorkStatus;

Fcs discrim (Miss_Smoking Miss_Alcohol Miss_WorkStatus/details classeffects=include);

Var Age nationality relationStatus Education Region ChronicDisease Miss_Smoking Miss_Alcohol Miss_WorkStatus; run;


I also get an imputed table but with almost the same distributions for imputed variables.

However there are still warning messages:

WARNING: The covariates are not specified in an FCS discriminant method for variable nationality, only remaining

         continuous variables will be used as covariates with the default CLASSEFFECTS=EXCLUDE option

...

Does this mean that the 'proc mi' is using only the continuous variable to impute missing values?

I tried removing all the categorical variables:

PROC MI DATA = missing  out=FCS3 nimpute=5 seed=22222;

class  Miss_Smoking Miss_Alcohol Miss_WorkStatus;

fcs discrim (Miss_Smoking Miss_Alcohol Miss_WorkStatus/details classeffects=include);

var Age Miss_Smoking Miss_Alcohol Miss_WorkStatus; run;


No warning message; also, I get comparable distributions to the imputed variables...


Is there any other way/ option I could use that would use all my covariates and won't generate the warning messages?


Thank you


Fabienne

Deep81
Calcite | Level 5

Hello Fabienne,


I really do not understand a lot of the things about the MI procedures. A lot of times, even if something works or does not work, I am not sure why that it. But I have experienced similar situations as you described. Although some of the online resources mention that once we specify classeffects=include, all variables should be included (continuous and categorical), it does not work for me. The only way I have seen it work is if I describe a function for each variable (if you see my post dated May 28 above). For example, in your situation, you might want to try something like:


FCSdiscrim(Miss_Smoking= Age nationality relationStatus Education Region ChronicDisease/classefects=include);

FCSdiscrim(Miss_Alcohol= Age nationality relationStatus Education Region ChronicDisease/classefects=include);

..... and so on for each variable that you want to impute


Best,


Deep

Fabie10
Calcite | Level 5

Thank you for your response Deep
I contacted SAS, it seems I had an older version of SAS/STAT (before 13.1) that has bugs concerning Proc Mi, and now that I have  SAS/STAT 14.1, there are no more warning and the 'classeffects=include' option works fine.

I encountered few problems with 'proc Mianalyze' especially because I'm doing a multinomial logit, and I solved it by pooling the results of each variable at a time for each response category.

Kind regards

Fabienne



dearfisher
SAS Employee

Hi Leanne,

 

Thank you very much for your posts. Just add a little using experience

 

Format 2

PROC MI DATA=complete OUT=complete_mi NIMPUTE=10 SEED=9455;

CLASS mamgp agegp treat rhistogp;

FCS reg(bmi = agegp mamgp treat rhistogp) ;

FCS logistic(agegp = mamgp bmi treat rhistogp) ;

FCS logistic(mamgp = agegp bmi treat rhistogp) ;

FCS discrim(treat = agegp bmi mamgp rhistogp/classeffects=include) ;

FCS discrim(rhistogp = agegp bmi mamgp treat/classeffects=include)  ;

VAR agegp bmi mamgp treat rhistogp;

RUN;

 

But we should be very careful of the SAS version, In my desktop, It is SAS 9.4 TS Level 1M1 and you do not need to input "FCS logistic(agegp = mamgp bmi treat rhistogp) ;" as agegp is no missing values. when I copy the codes into data safe haven,because the version is  SAS 9.4 TS Level 1M0, the warning information came out. 

 

Y.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 10664 views
  • 5 likes
  • 6 in conversation