BookmarkSubscribeRSS Feed
hng2r
Calcite | Level 5

Hello, thank you for taking the time to read this.

 

I am trying to calculate a fully-adjusted OR with a mixed-model procedure. For some reason, the variable SMKSTAT2CAT only shows three responses when the question has four responses. I understand that if the difference is extremely small, it may not show it. However, the p-value was significant for the variable SMKSTAT2CAT when a chi sq test was used, the confidence interval & p-value were significant for response #4 for SMKSTAT2CAT in an age-adjusted model, and there is a visibly large difference in between the two frequencies for that response compared to the other responses. In the analysis of the maximum likelihood estimates, the response is shown, but it showed this as follows: DF=0   Estimate=0   SE=.   Wald Chi-sq=.   Pr>Chisq=.

 

Could anyone possibly elaborate on why the response #4 for SMKSTAT2CAT is not showing?

 

SMKSTAT2CAT Freq:

        Group One:                            Group Two:

#1   828                                 #2    612

#2   401                                #2    361

#3   1,412                             #3    1,427

#4   5,295                             #4    6,173

 

Here is my syntax:

 

proc logistic data= less;
class smkevcat (param=ref ref='1') R_MARITLCAT (param=ref ref='1') BMIcat (param=ref ref='4')
AHICOMPCAT (param=ref ref='2') SMKSTAT2CAT (param=ref ref='1') vignocat (param=ref ref='2')
modnocat (param=ref ref='2') strngnocat (param=ref ref='2') SHTFLU2CAT (param=ref ref='2')
SHTPNUYRCAT (param=ref ref='2') SHTHEPBCAT (param=ref ref='2') SHTHEPACAT (param=ref ref='2')
SHTTDCAT (param=ref ref='2') SHTHPV2CAT (param=ref ref='2') AHCSYR8CAT (param=ref ref='2')
AHCSYR9CAT (param=ref ref='2') AMDLONGRCAT (param=ref ref='0') APSPAPCAT (param=ref ref='2')
APSMAMCAT (param=ref ref='2') APSCOLCAT (param=ref ref='2') ASIHIVTCAT (param=ref ref='2');
model plborn = smkevcat r_maritlcat bmicat ahicompcat smkstat2cat vignocat modnocat strngnocat
shtflu2cat shtpnuyrcat shthepbcat shthepacat shttdcat shthpv2cat ahcsyr8cat ahcsyr9cat
amdlongrcat apspapcat apsmamcat apscolcat asihivtcat age_p;
run;

 

Thank you for your help!

5 REPLIES 5
Reeza
Super User
Because when you have a categorical variable there is no estimate for the reference level, when using referential coding. You should have no estimate for reference level=1. I assume that every categorical variable has a missing level, not just that one. Or is this in addition to your missing reference level. If that's the case it's probably a one to one match with another categorical variable level which will be a little hard to find with that many variables.

Note that since you're using PARAM=REF you can include that after, rather than have it on each categorical variable:
class ....variable list ..... / param=ref;

This will apply to all variables in the list.
hng2r
Calcite | Level 5

From my understanding, many modeling procedures include options in their CLASS statements (or in other statements) which allow you to specify reference levels for categorical predictor variables. In addition, PARAM=REF has to be included for every variable because some variables have different referent groups. 

 

Every single variable & its responses are shown except for SMKSTAT2CAT's response 4. Response 2 and 3 are shown with the referent group as 1. 

 

If I adjust for age, all of the responses are shown for SMKSTAT2CAT. If I adjust for all of the variables, this is what occurs. If I change my referent group to 2 for SMKSTAT2CAT in a fully-adjusted model, the response 4 is now shown. However, I need the referent group to be response 1. 

 

Thank you for your help.

Reeza
Super User

From my understanding, many modeling procedures include options in their CLASS statements (or in other statements) which allow you to specify reference levels for categorical predictor variables. In addition, PARAM=REF has to be included for every variable because some variables have different referent groups. 

 

You can specify them separately, like this

class   smkevcat ( ref='1') 
        R_MARITLCAT ( ref='1') 
        BMIcat ( ref='4')
        AHICOMPCAT ( ref='2') 
        SMKSTAT2CAT ( ref='1') 
        vignocat ( ref='2')
        modnocat ( ref='2') 
        strngnocat ( ref='2') 
        SHTFLU2CAT ( ref='2')
        SHTPNUYRCAT ( ref='2') 
        SHTHEPBCAT ( ref='2') 
        SHTHEPACAT ( ref='2')
        SHTTDCAT ( ref='2')     
        SHTHPV2CAT ( ref='2') 
        AHCSYR8CAT ( ref='2')
        AHCSYR9CAT ( ref='2') 
        AMDLONGRCAT ( ref='0') 
        APSPAPCAT ( ref='2')
        APSMAMCAT ( ref='2') 
        APSCOLCAT ( ref='2') 
        ASIHIVTCAT ( ref='2') 
                            / param=ref;

I think you likely have the other issue I mentioned then, the variables are too correlated with another variable. You can test this by creating the dummy variables semi-manually and then running a correlation check on them or something to check which ones are the same. If you have context of the data verify the categories and see if any are likely highly correlated. Usually there's a note in the log when this happens as well though. So verify your log. 

 

 

hng2r
Calcite | Level 5

That makes sense, thank you!! There was nothing in the log. I have a limited knowledge of SAS, so I was wondering what test/statement I could use to find the correlation of the dummy variables? I just created them.

 

Thank you

Reeza
Super User
Honestly not sure, besides an exact check which would be tedious. I would probably start by running a proc corr (correlation matrix) and look at the variables that are really highly correlated. Its not supposed to be used for 0/1 variables but will likely point you in the direction you need.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1303 views
  • 0 likes
  • 2 in conversation