everyone I am new here and I need some help. I keep encountering this error:
proc surveylogistic data=nhis29;
cluster ppsu;
strata pstrat;
weight wtfa;
class SRVY_YR(ref='2021') edu(ref='1') pov(ref='1') sex(ref='1')/param=ref;
model ft=srvy_yr edu pov sex/expb;
run;
ERROR: Invalid reference value for SRVY_YR.
Yet for the same variable and reference everything seems fine as shown below:
proc surveylogistic data=nhis29;
cluster ppsu;
strata pstrat;
weight wtfa;
class SRVY_YR(ref='2021')/param=ref;
model ft(event='1')=srvy_yr/expb;
run;
NOTE: PROC SURVEYLOGISTIC is modeling the probability that ft=1. NOTE: Convergence criterion (GCONV=1E-8) satisfied. NOTE: PROCEDURE SURVEYLOGISTIC used (Total process time): real time 0.20 seconds cpu time 0.11 seconds
I can't think of what I missed, and any assistance would be appreciated
I tried proc surveyfreq and 2021 is a category and works fine
Your 2nd model (where the reference level '2021' is accepted) is a lot more parsimonious / succinct than the 1st model. The 1st one has many more Independent Variables (IV's).
By including extra IV's you risk that more observations are banned from the analysis because of missing values. Check the number of observations used in the 1st surveylogistic and check the number of observations used in the 2nd surveylogistic, I bet the 2nd number is much higher.
Among all complete-case observations remaining in the first surveylogistic, there are -- in my opinion -- none left that still contain ‘2021’ for that SRVY_YR Class variable. Please check !
Good luck with your analysis.
Koen
Thanks alot for this
You can examine @sbxkoenk suggestion of possible problems with the multiple independent variables using code like this in Proc Freq:
Proc freq data=nhis29; tables srvy_yr * edu * pov * sex / list missing; run;
If ALL of the Srvy_yr=2021 have missing values for one or more of the other variables it will appear pretty easily.
What the LIST option is does is place all the values on one line so is relatively easy to read and the Missing option means they appear in the body of the table so you can find how many and with which variables they appear. Probably not as useful with multiple continuous variable but your variables look like this shouldn't be to long of a result.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.