Help with this code proc surveylogistic code

Michelle_AD · Posted 12-08-2024 12:39 AM

everyone I am new here and I need some help. I keep encountering this error:

 proc surveylogistic data=nhis29;
 cluster ppsu;
 strata pstrat;
 weight wtfa;
 class SRVY_YR(ref='2021') edu(ref='1') pov(ref='1') sex(ref='1')/param=ref;
 model ft=srvy_yr edu pov sex/expb;
 run;

ERROR: Invalid reference value for SRVY_YR.

Yet for the same variable and reference everything seems fine as shown below:

 proc surveylogistic data=nhis29;
 cluster ppsu;
 strata pstrat;
 weight wtfa;
 class SRVY_YR(ref='2021')/param=ref;
 model ft(event='1')=srvy_yr/expb;
 run;

NOTE: PROC SURVEYLOGISTIC is modeling the probability that ft=1. NOTE: Convergence criterion (GCONV=1E-8) satisfied. NOTE: PROCEDURE SURVEYLOGISTIC used (Total process time): real time 0.20 seconds cpu time 0.11 seconds

I can't think of what I missed, and any assistance would be appreciated

I tried proc surveyfreq and 2021 is a category and works fine

sbxkoenk · Posted 12-08-2024 05:37 AM

Your 2nd model (where the reference level '2021' is accepted) is a lot more parsimonious / succinct than the 1st model. The 1st one has many more Independent Variables (IV's).

By including extra IV's you risk that more observations are banned from the analysis because of missing values. Check the number of observations used in the 1st surveylogistic and check the number of observations used in the 2nd surveylogistic, I bet the 2nd number is much higher.

Among all complete-case observations remaining in the first surveylogistic, there are -- in my opinion -- none left that still contain ‘2021’ for that SRVY_YR Class variable. Please check !

Usage Note 37108: Setting reference levels for CLASS predictor variables
https://support.sas.com/kb/37/108.html

Good luck with your analysis.

Koen

Michelle_AD · Posted 12-08-2024 12:21 PM

Thanks alot for this

ballardw · Posted 12-08-2024 10:25 PM

You can examine @sbxkoenk suggestion of possible problems with the multiple independent variables using code like this in Proc Freq:

Proc freq data=nhis29;
   tables srvy_yr * edu * pov * sex / list missing;
run;

If ALL of the Srvy_yr=2021 have missing values for one or more of the other variables it will appear pretty easily.

What the LIST option is does is place all the values on one line so is relatively easy to read and the Missing option means they appear in the body of the table so you can find how many and with which variables they appear. Probably not as useful with multiple continuous variable but your variables look like this shouldn't be to long of a result.