Solved: Re: Output Estimated Proportions based on Proc Logistic Outputs (e.g. ...

AlexPaezSilva · Posted 10-20-2021 01:41 PM

Hello Everyone,

I am new to modelling and have the following bit of code.

Predictor: V_ASIAN (Race)

Outcome: SC_ETHRACE_AM (discrimination)

Controls: Age5 (Age groups), Sex, Edu_Level (5 groups), Region (5 groups), LFS (labour force status 3 groups)

Parametrization: I'm using dummy coding with specific reference groups listed

Purpose: To see what the effect of reporting a given racialized group (3 groups) will have on the reported perceived discrimination (YES, NO - dichotomous) while controlling for age, sex, education level, region and labour force status (See above).

Output: At the moment the basic output that PROC LOGISTIC is spitting out are the odds ratio for each pair combination. In Stata there is a statement ('margin') that will allow for an estimated proportion given the model. So that instead of saying a given Racialized group has 10x the odds of being discriminated against compared to the reference category, I can say what the adjusted proportion of discrimination for a given racialized group would be while having controlled for several characteristics.

Prep: I've removed non-response for predictors, outcome and control variables and I need to use normalized survey weights.

Question: Is there a transformation technique/statement that would provide me the estimated proportions (see above) + the standard error + Confidence intervals + Pvalues? In STATA this is easily done with the "MARGIN" statement.

Thank you all so much!

proc logistic data=work.V2 ;
CLASS V_ASIAN (ref='3') AGE5(ref='3') SEX(ref='1') 
EDU_LEVEL(ref='4') REGION (ref='3') LFS (REF='1') /param=ref ;	
model SC_ETHRACE_AM = V_ASIAN age5 sex edu_level region lfs; 
weight wght_per / norm; 
run;

StatDave · Posted 10-20-2021 03:39 PM

First, if your weights are survey weights then you should NOT be using PROC LOGISTIC. If does not use the proper variance estimator for survey data. Use PROC SURVEYLOGISTIC instead. For either of these procedures, I strongly advise you to always use the EVENT= response variable option to specify the level of your binary response variable that represents the level whose probability you want to model (for example: model sc_ethrace_am(event="Yes")= ... ). Now, if your goal is to estimate the predicted event probability for each level of a predictor, like V_ASIAN, controlling for the other predictors then you can do that with the LSMEANS statement. The ILINK and CL options in this statement give the predicted probability estimate (in the Mean column) and a confidence interval.

lsmeans v_asian / ilink cl;

Note that LS-means are not the same as margins. In part, LS-means fix ALL of the other predictors at their means or reference levels while margins use the actual values and then averages over the predicted values. Margins can be estimated using the MARGINS macro, but that macro is not available for survey models.

View solution in original post

Reeza · Posted 10-20-2021 01:44 PM

I think you want the ODDSRATIO and/or ESTIMATE statements.

PaigeMiller · Posted 10-20-2021 01:53 PM

The PROC LOGISTIC statement OUTPUT allows you to obtain predicted values, confidence intervals for the predicted values, standard error of the predicted values. https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_logistic_syntax27.htm

--
Paige Miller

StatDave · Posted 10-20-2021 03:39 PM

First, if your weights are survey weights then you should NOT be using PROC LOGISTIC. If does not use the proper variance estimator for survey data. Use PROC SURVEYLOGISTIC instead. For either of these procedures, I strongly advise you to always use the EVENT= response variable option to specify the level of your binary response variable that represents the level whose probability you want to model (for example: model sc_ethrace_am(event="Yes")= ... ). Now, if your goal is to estimate the predicted event probability for each level of a predictor, like V_ASIAN, controlling for the other predictors then you can do that with the LSMEANS statement. The ILINK and CL options in this statement give the predicted probability estimate (in the Mean column) and a confidence interval.

lsmeans v_asian / ilink cl;

Note that LS-means are not the same as margins. In part, LS-means fix ALL of the other predictors at their means or reference levels while margins use the actual values and then averages over the predicted values. Margins can be estimated using the MARGINS macro, but that macro is not available for survey models.

AlexPaezSilva · Posted 10-21-2021 02:36 PM

Hi @StatDave !

Thanks for that, I really appreciate it.
I was wondering, however, about why weights would not be properly accounted for in PROC LOGISTIC

given that the proc contains a weight statement? I take your point but I'm just curious as to why they

would allow weighted models in a proc that isn't set up to handle that? Is it any type of weight or specific types of weights

that are problematic?

Thanks a bunch!

StatDave · Posted 10-21-2021 02:50 PM

The issue is in the computation of the variances of the model parameter estimates. As I mentioned, there are special variance estimators that are used for analysis of survey data. In the non-survey procedures, the values of the WEIGHT variable just multiply the observations' contributions to the log likelihood and then the usual maximum likelihood estimation is done. The estimators needed for survey data are not employed.

AlexPaezSilva · Posted 10-27-2021 03:32 PM

So I tried the LSMEANS route but it keeps crashing. I'm unsure as to why given that all the variables listed are categorical (e.g. Male=2, Female=1, etc.) and there are no continuous variables.

I get the following error:
"ERROR: Only CLASS variables allowed in this effect."

/*LSMEANS TEST WITH GLM PARAMETRIZATION*/
proc surveylogistic data=work.SI_V2;
class V_ASIAN VISMIN AGE5_GLM SEX_GLM EDU_GLM REGION_GLM LFS_GLM / param=GLM;
model SC_ETHRACE_AM (event='1')= VISMIN AGE5_GLM SEX_GLM EDU_GLM REGION_GLM LFS_GLM;
weight NORM_WT;
LSMEANS SC_ETHRACE_AM / ilink cl ;
run;

Reeza · Posted 10-27-2021 03:45 PM

The dependent variable isn't a class/categorical variable.....

AlexPaezSilva · Posted 10-27-2021 03:53 PM

Oops! It was just a typo, the line was meant to look like

LSMEANS V_ASIAN / ilink cl diff;
run;

That solves this part of it then.

It runs now, though the output is full "non-est" and blanks so there's something else happening now (see below). *sigh*

Must I absolutely specify the AT option to get any probabilities? I assumed the above code would give me the predicted means, standard error and lower/upper bounds but it doesn't seem like it (see screenshot).

StatDave · Posted 10-27-2021 04:18 PM

LS-means are linear combinations of model parameters and they can be nonestimable depending on your model specification and the arrangement of your data. Changes to either might allow the estimates to be provided. You could try simplifying your model. Or it might help to change the linear combinations that are used such as by specifying the OM option and/or the BYLEVEL option in the LSMEANS statement. You will want to read about what these options do in the documentation of the LSMEANS statement.

Output Estimated Proportions based on Proc Logistic Outputs (e.g. 'Margin' statement in Stata)?

Re: Output Estimated Proportions based on Proc Logistic Outputs (e.g. 'Margin' statement in Stata)?

Re: Output Estimated Proportions based on Proc Logistic Outputs (e.g. 'Margin' statement in Stata)?

Re: Output Estimated Proportions based on Proc Logistic Outputs (e.g. 'Margin' statement in Stata)?

Re: Output Estimated Proportions based on Proc Logistic Outputs (e.g. 'Margin' statement in Stata)?

Re: Output Estimated Proportions based on Proc Logistic Outputs (e.g. 'Margin' statement in Stata)?

Re: Output Estimated Proportions based on Proc Logistic Outputs (e.g. 'Margin' statement in Stata)?

Re: Output Estimated Proportions based on Proc Logistic Outputs (e.g. 'Margin' statement in Stata)?

Re: Output Estimated Proportions based on Proc Logistic Outputs (e.g. 'Margin' statement in Stata)?

Re: Output Estimated Proportions based on Proc Logistic Outputs (e.g. 'Margin' statement in Stata)?

Re: Output Estimated Proportions based on Proc Logistic Outputs (e.g. 'Margin' statement in Stata)?

Ready to join fellow brilliant minds for the SAS Hackathon?