BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AlexPaezSilva
Fluorite | Level 6

Hello Everyone, 

 

I am new to modelling and have the following bit of code. 

Predictor: V_ASIAN (Race)

Outcome: SC_ETHRACE_AM (discrimination)

Controls: Age5 (Age groups), Sex, Edu_Level (5 groups), Region (5 groups), LFS (labour force status 3 groups)

Parametrization: I'm using dummy coding with specific reference groups listed

Purpose: To see what the effect of reporting a given racialized group (3 groups) will have on the reported perceived discrimination (YES, NO - dichotomous) while controlling for age, sex, education level, region and labour force status (See above).

 

Output: At the moment the basic output that PROC LOGISTIC is spitting out are the odds ratio for each pair combination. In Stata there is a statement ('margin') that will allow for an estimated proportion given the model. So that instead of saying a given Racialized group has 10x the odds of being discriminated against compared to the reference category, I can say what the adjusted proportion of discrimination for a given racialized group would be while having controlled for several characteristics. 

 

Prep: I've removed non-response for predictors, outcome and control variables and I need to use normalized survey weights. 

 

Question: Is there a transformation technique/statement that would provide me the estimated proportions (see above) + the standard error + Confidence intervals + Pvalues? In STATA this is easily done with the "MARGIN" statement. 

 

Thank you all so much!

 

 

proc logistic data=work.V2 ;
CLASS V_ASIAN (ref='3') AGE5(ref='3') SEX(ref='1') 
EDU_LEVEL(ref='4') REGION (ref='3') LFS (REF='1') /param=ref ;	
model SC_ETHRACE_AM = V_ASIAN age5 sex edu_level region lfs; 
weight wght_per / norm;
run;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

First, if your weights are survey weights then you should NOT be using PROC LOGISTIC. If does not use the proper variance estimator for survey data. Use PROC SURVEYLOGISTIC instead. For either of these procedures, I strongly advise you to always use the EVENT= response variable option to specify the level of your binary response variable that represents the level whose probability you want to model (for example: model sc_ethrace_am(event="Yes")= ... ). Now, if your goal is to estimate the predicted event probability for each level of a predictor, like V_ASIAN, controlling for the other predictors then you can do that with the LSMEANS statement. The ILINK and CL options in this statement give the predicted probability estimate (in the Mean column) and a confidence interval.

lsmeans v_asian / ilink cl;

Note that LS-means are not the same as margins. In part, LS-means fix ALL of the other predictors at their means or reference levels while margins use the actual values and then averages over the predicted values. Margins can be estimated using the MARGINS macro, but that macro is not available for survey models.

View solution in original post

9 REPLIES 9
Reeza
Super User

I think you want the ODDSRATIO and/or ESTIMATE statements.

PaigeMiller
Diamond | Level 26

The PROC LOGISTIC statement OUTPUT allows you to obtain predicted values, confidence intervals for the predicted values, standard error of the predicted values. https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_logistic_syntax27.htm

--
Paige Miller
StatDave
SAS Super FREQ

First, if your weights are survey weights then you should NOT be using PROC LOGISTIC. If does not use the proper variance estimator for survey data. Use PROC SURVEYLOGISTIC instead. For either of these procedures, I strongly advise you to always use the EVENT= response variable option to specify the level of your binary response variable that represents the level whose probability you want to model (for example: model sc_ethrace_am(event="Yes")= ... ). Now, if your goal is to estimate the predicted event probability for each level of a predictor, like V_ASIAN, controlling for the other predictors then you can do that with the LSMEANS statement. The ILINK and CL options in this statement give the predicted probability estimate (in the Mean column) and a confidence interval.

lsmeans v_asian / ilink cl;

Note that LS-means are not the same as margins. In part, LS-means fix ALL of the other predictors at their means or reference levels while margins use the actual values and then averages over the predicted values. Margins can be estimated using the MARGINS macro, but that macro is not available for survey models.

AlexPaezSilva
Fluorite | Level 6

Hi @StatDave !

 

Thanks for that, I really appreciate it. 
I was wondering, however, about why weights would not be properly accounted for in PROC LOGISTIC 

given that the proc contains a weight statement? I take your point but I'm just curious as to why they 

would allow weighted models in a proc that isn't set up to handle that? Is it any type of weight or specific types of weights

that are problematic? 

Thanks a bunch! 

StatDave
SAS Super FREQ

The issue is in the computation of the variances of the model parameter estimates. As I mentioned, there are special variance estimators that are used for analysis of survey data. In the non-survey procedures, the values of the WEIGHT variable just multiply the observations' contributions to the log likelihood and then the usual maximum likelihood estimation is done. The estimators needed for survey data are not employed.

AlexPaezSilva
Fluorite | Level 6

So I tried the LSMEANS route but it keeps crashing. I'm unsure as to why given that all the variables listed are categorical (e.g. Male=2, Female=1, etc.) and there are no continuous variables.

I get the following error:
"ERROR: Only CLASS variables allowed in this effect."

 

 

/*LSMEANS TEST WITH GLM PARAMETRIZATION*/
proc surveylogistic data=work.SI_V2;
class V_ASIAN VISMIN AGE5_GLM SEX_GLM EDU_GLM REGION_GLM LFS_GLM / param=GLM;
model SC_ETHRACE_AM (event='1')= VISMIN AGE5_GLM SEX_GLM EDU_GLM REGION_GLM LFS_GLM;
weight NORM_WT;
LSMEANS SC_ETHRACE_AM / ilink cl ;
run;

 

 

 

 

 

 

Reeza
Super User
The dependent variable isn't a class/categorical variable.....
AlexPaezSilva
Fluorite | Level 6

Oops! It was just a typo, the line was meant to look like

 

LSMEANS V_ASIAN / ilink cl diff;
run;

That solves this part of it then. 

It runs now, though the output is full "non-est" and blanks so there's something else happening now (see below). *sigh*

Must I absolutely specify the AT option to get any probabilities? I assumed the above code would give me the predicted means, standard error and lower/upper bounds but it doesn't seem like it (see screenshot). 

 

AlexPaezSilva_0-1635364310823.png

 

StatDave
SAS Super FREQ
LS-means are linear combinations of model parameters and they can be nonestimable depending on your model specification and the arrangement of your data. Changes to either might allow the estimates to be provided. You could try simplifying your model. Or it might help to change the linear combinations that are used such as by specifying the OM option and/or the BYLEVEL option in the LSMEANS statement. You will want to read about what these options do in the documentation of the LSMEANS statement.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1607 views
  • 8 likes
  • 4 in conversation