Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Multiple imputation, FCS discrim and logistic; in SAS 9.3

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-06-2013 09:36 AM
(11359 views)

Hi,

I am working with a data set that contains many variables that I want to use to impute missing values (arbitrary missing pattern), most of which have missing values. The majority of my variables are discrete (some nominal and some ordinal). I know I need to use the FCS dscrim and logistic functions within proc mi, but I am unsure of the syntax when imputing multiple variables, some of which are also are being imputed. Since the FCS function is so new I am having trouble finding examples of code online. I was wondering if anyone has experience using this command and could give me some advice.

Here are three simplified sample codes, one of which may be close to correct:

note on variables: bmi is continuous, mamgp3 is ordinal, agegp is ordinal (no missing values), treat is nominal, and rhistogp is nominal, the variables are in the var statement in order of least missing values to most missing values.

Format 1:

PROC MI DATA=complete OUT=complete_mi NIMPUTE=10 SEED=9455;

CLASS mamgp agegp treat rhistogp;

FCS logistic(agegp) reg(bmi) logistic(mamgp) discrim(treat) discrim(rhistogp) ;

VAR agegp bmi mamgp treat rhistogp;

RUN;

Format 2

PROC MI DATA=complete OUT=complete_mi NIMPUTE=10 SEED=9455;

CLASS mamgp agegp treat rhistogp;

FCS reg(bmi = agegp mamgp treat rhistogp) ;

FCS logistic(agegp = mamgp bmi treat rhistogp) ;

FCS logistic(mamgp = agegp bmi treat rhistogp) ;

FCS discrim(treat = agegp bmi mamgp rhistogp) ;

FCS discrim(rhistogp = agegp bmi mamgp treat) ;

VAR agegp bmi mamgp treat rhistogp;

RUN;

Format 3: For this one I turned the nominal variables into binary dummy variables so that I could include them in the fcs logistic (does that make sense?)

PROC MI DATA=complete OUT=complete_mi NIMPUTE=10 SEED=9455;

CLASS mamgp agegp treat1 treat2 treat3 rhistogp1 rhistogp2;

FCS reg(bmi = agegp mamgp treat1 treat2 treat3 rhistogp1 rhistogp2) ;

FCS logistic(mamgp = agegp bmi treat1 treat2 treat3 rhistogp1 rhistogp2) ;

FCS logistic(agegp = mamgp bmi treat1 treat2 treat3 rhistogp1 rhistogp2) ;

FCS logistic(treat1 = agegp bmi mamgp treat2 treat3 rhistogp1 rhistogp2) ;

FCS logistic(treat2 = agegp bmi mamgp treat1 treat3 rhistogp1 rhistogp2) ;

FCS logistic(treat3 = agegp bmi mamgp treat1 treat2 rhistogp1 rhistogp2) ;

FCS logistic(rhistogp1 = agegp bmi mamgp treat1 treat2 treat3 rhistogp2) ;

FCS logistic(rhistogp2 = agegp bmi mamgp treat1 treat2 treat3 rhistogp1) ;

VAR agegp bmi mamgp treat1 treat2 treat3 rhistogp1 rhistogp2;

RUN;

When I run "Format 1" I get these warnings in the log:

WARNING: The covariates are not specified in an FCS discriminant method for variable treat, only

remaining continuous variables will be used as covariates.

WARNING: The covariates are not specified in an FCS discriminant method for variable rhistogp,

only remaining continuous variables will be used as covariates.

When I *try* to run "Format 2" I get this error in the log:

ERROR: The class variables cannot be used as covariates in an FCS discriminant method.

When I ran "Format 3" I got approximately 800 warnings for each of the dummy variables, example:

WARNING: The maximum likelihood estimates for the logistic regression with observed observations

may not exist for variable rhistogp2. The posterior predictive distribution of the

parameters used in the imputation process is based on the maximum likelihood estimates

in the last maximum likelihood iteration.

Does Format 1 take the other class variables into account when I impute a class variable does it take into account class variables when imputing the continuous variable? If not, is there a way to do this?

Thank you! Let me know if you require additional information.

Message was edited by: Leanne Shulman

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Just in case others have the same question.

I contacted SAS tech support and this was the reply:

In SAS 9.3TS1M0 the discriminant method will only use continuous variables as predictors (that is why format 2 generates an ERROR). In the latest release of SAS/STAT 12.1, which is part of SAS 9.3TS1M2, you can specify that it also use CLASS variables as well with the CLASSEFFECTS=INCLUDE option.

That being said, you would not want to take the approach in method 3 of creating dummy variables. You will likely encounter convergence issues in the logistic models (as you are seeing) and even if they did run there is no way to guarantee that an observation won't be predicted as a value of 1 for more than one of the dummy variables.

Method 1 is really the same thing as:

PROC MI DATA=complete OUT=complete_mi NIMPUTE=10 SEED=9455;

CLASS mamgp agegp treat rhistogp;

FCS logistic(agegp=bmi mamgp treat rhistogp);

FCS reg(bmi=agegp mamgp treat rhistogp);

FCS logistic(mamgp=bmi agegp treat rhistogp);

FCS discrim(treat=bmi);

FCS discrim(rhistogp=bmi) ;

VAR agegp bmi mamgp treat rhistogp;

RUN;

It uses all the variables for REG and LOGISTIC as predictors and the continuous ones for DISCRIM.

9 REPLIES 9

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Just in case others have the same question.

I contacted SAS tech support and this was the reply:

In SAS 9.3TS1M0 the discriminant method will only use continuous variables as predictors (that is why format 2 generates an ERROR). In the latest release of SAS/STAT 12.1, which is part of SAS 9.3TS1M2, you can specify that it also use CLASS variables as well with the CLASSEFFECTS=INCLUDE option.

That being said, you would not want to take the approach in method 3 of creating dummy variables. You will likely encounter convergence issues in the logistic models (as you are seeing) and even if they did run there is no way to guarantee that an observation won't be predicted as a value of 1 for more than one of the dummy variables.

Method 1 is really the same thing as:

PROC MI DATA=complete OUT=complete_mi NIMPUTE=10 SEED=9455;

CLASS mamgp agegp treat rhistogp;

FCS logistic(agegp=bmi mamgp treat rhistogp);

FCS reg(bmi=agegp mamgp treat rhistogp);

FCS logistic(mamgp=bmi agegp treat rhistogp);

FCS discrim(treat=bmi);

FCS discrim(rhistogp=bmi) ;

VAR agegp bmi mamgp treat rhistogp;

RUN;

It uses all the variables for REG and LOGISTIC as predictors and the continuous ones for DISCRIM.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

thanks, Leanne!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello,

I have a question related to the topic of this post. I am hoping that you can help me out.

I am trying to impute missing values in a dataset that has mostly categorical variables (binary, ordinal or nominal). I am using the discriminant FCS method for each of the variables.

Now, I am trying to impute my socio-demographic variables with missing values (age, education, income, marital status, race, ethnicity) using the other socio-demographic variables only. There are some other psychosocial variables [strs_fm3 and strs_dv3]; to impute those which I am trying to use both the socio-demographic, as well as other psychosocial variables. I am doing this in SAS 9.4 and here is an example of the code I am using (after multiple trials and errors):

**proc** **mi** data= prams_diss seed=**1305417** out=prams_diss_imp_may28 nimpute=10;

class mat_age_naphsis mat_race hispanic mat_ed married income5 strs_fm3 strs_dv3;

FCS discrim(mat_age_naphsis=mat_race hispanic mat_ed married income5 /classeffects=include);

FCS discrim(mat_race=mat_age_naphsis hispanic mat_ed married income5 / classeffects=include);

FCS discrim(hispanic= mat_race mat_age_naphsis mat_ed married income5 /classeffects=include);

FCS discrim(mat_ed=mat_age_naphsis mat_race hispanic married income5 / classeffects=include);

FCS discrim(married=mat_age_naphsis mat_race hispanic mat_ed income5 / classeffects=include);

FCS discrim(income5=mat_age_naphsis mat_race hispanic married mat_ed / classeffects=include);

FCS discrim (strs_fm3= mat_age_naphsis mat_race hispanic mat_ed married income5 strs_dv3/ classeffects=include);

FCS discrim (strs_dv3= mat_age_naphsis mat_race hispanic mat_ed married income5 strs_fm3/ classeffects=include);

var mat_age_naphsis married mat_ed mat_race hispanic income5 strs_dv3 strs_fm3;

**run**;

The model has run successfully without any errors. At the end, a dataset with 10 times the no. of observations as in the original file as been created. In the output file, I have only the information about no. of variables and the table (the big one with crosses) with missing value patterns.

I just wanted to make sure whether the imputation has been done exactly how I wanted. That is, only other socio-demographic variables are used to impute each of the socio-demographics. And for the stress variables, information from both the socio-demographic, plus, the other stress variable.

Do you think (based on my code), that (using the specific variables to impute specific variables) is what I have been able to do?

Or, has SAS used all the variables specified in the var and class statements to impute each variable? Is there any way to know what is going on?

Thanks a lot.

Deep

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I'm not an expert on MI, but I think there is an easy way for you to answer your question empirically. Run a test in which you use one of the FCS statements AND ALSO use a KEEP statement on the DATA= option so that the only variables in the input data are the ones that you are trying to use for the imputation. Use NIMPUTE=1. Does the output of the test match the original output? If so, repeat for the other FCS statements.

For example, for the first FCS statement, you could run

**proc** **mi** data=prams_diss seed=**1305417** out=prams_diss_imp_may28 nimpute=1;

keep mat_age_naphsis=mat_race hispanic mat_ed married income5;

class mat_age_naphsis mat_race hispanic mat_ed married income5;

FCS discrim(mat_age_naphsis=mat_race hispanic mat_ed married income5 /classeffects=include);

var mat_age_naphsis mat_race hispanic mat_ed married income5;

**run**;

The results of this proc call can only depend on the variables in the KEEP statement. Now rerun the analysis, but this time omit the KEEP statement and put ALL variables in the CLASS and VAR statements. Do you get the same answers?

Don't know if this will work or not, but it might clarify what is happening.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks, Rick. I think your suggestion makes sense and I am going to try that.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello,

I work on SAS 9.4.

I have to impute categorical variables ( Miss_Smoking Miss_Alcohol Miss_WorkStatus) with arbitrary missing value patterns.

I only have one continuous (complete) variable in my dataset: 'age'.

I tried this code:

PROC MI DATA = missing out=FCS nimpute=5 seed=22222;

Class nationality relationStatus Education Region ChronicDisease Miss_Smoking Miss_Alcohol Miss_WorkStatus;

Fcs nbiter=40 logistic (Miss_Smoking Miss_Alcohol Miss_WorkStatus/details);

Var age nationality relationStatus Education Region ChronicDisease Miss_Smoking Miss_Alcohol Miss_WorkStatus; run;

The imputation works, generating a table with imputed data, but I get the following warning:

**WARNING: The covariates are not specified in an FCS discriminant method for variable nationality, only remaining**

** continuous variables will be used as covariates with the default CLASSEFFECTS=EXCLUDE option.**

...

So I used the 'FCS discrim' function:

PROC MI DATA = missing out=FCS2 nimpute=5 seed=22222;

Class nationality relationStatus Education Region ChronicDisease Miss_Smoking Miss_Alcohol Miss_WorkStatus;

Fcs discrim (Miss_Smoking Miss_Alcohol Miss_WorkStatus/details classeffects=include);

Var Age nationality relationStatus Education Region ChronicDisease Miss_Smoking Miss_Alcohol Miss_WorkStatus; run;

I also get an imputed table but with almost the same distributions for imputed variables.

However there are still warning messages:

**WARNING: The covariates are not specified in an FCS discriminant method for variable nationality, only remaining**

** continuous variables will be used as covariates with the default CLASSEFFECTS=EXCLUDE option**

...

Does this mean that the 'proc mi' is using only the continuous variable to impute missing values?

I tried removing all the categorical variables:

PROC MI DATA = missing out=FCS3 nimpute=5 seed=22222;

class Miss_Smoking Miss_Alcohol Miss_WorkStatus;

fcs discrim (Miss_Smoking Miss_Alcohol Miss_WorkStatus/details classeffects=include);

var Age Miss_Smoking Miss_Alcohol Miss_WorkStatus; run;

No warning message; also, I get comparable distributions to the imputed variables...

Is there any other way/ option I could use that would use all my covariates and won't generate the warning messages?

Thank you

Fabienne

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello Fabienne,

I really do not understand a lot of the things about the MI procedures. A lot of times, even if something works or does not work, I am not sure why that it. But I have experienced similar situations as you described. Although some of the online resources mention that once we specify classeffects=include, all variables should be included (continuous and categorical), it does not work for me. The only way I have seen it work is if I describe a function for each variable (if you see my post dated May 28 above). For example, in your situation, you might want to try something like:

FCSdiscrim(Miss_Smoking= Age nationality relationStatus Education Region ChronicDisease/classefects=include);

FCSdiscrim(Miss_Alcohol= Age nationality relationStatus Education Region ChronicDisease/classefects=include);

..... and so on for each variable that you want to impute

Best,

Deep

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you for your response Deep

I contacted SAS, it seems I had an older version of SAS/STAT (before 13.1) that has bugs concerning Proc Mi, and now that I have SAS/STAT 14.1, there are no more warning and the 'classeffects=include' option works fine.

I encountered few problems with 'proc Mianalyze' especially because I'm doing a multinomial logit, and I solved it by pooling the results of each variable at a time for each response category.

Kind regards

Fabienne

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Leanne,

Thank you very much for your posts. Just add a little using experience

Format 2

PROC MI DATA=complete OUT=complete_mi NIMPUTE=10 SEED=9455;

CLASS mamgp agegp treat rhistogp;

FCS reg(bmi = agegp mamgp treat rhistogp) ;

FCS logistic(agegp = mamgp bmi treat rhistogp) ;

FCS logistic(mamgp = agegp bmi treat rhistogp) ;

FCS discrim(treat = agegp bmi mamgp rhistogp/classeffects=include) ;

FCS discrim(rhistogp = agegp bmi mamgp treat/classeffects=include) ;

VAR agegp bmi mamgp treat rhistogp;

RUN;

But we should be very careful of the SAS version, In my desktop, It is SAS 9.4 TS Level 1M1 and you do not need to input "FCS logistic(agegp = mamgp bmi treat rhistogp) ;" as agegp is no missing values. when I copy the codes into data safe haven,because the version is SAS 9.4 TS Level 1M0, the warning information came out.

Y.

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.