Hi,
I have a dataset with information on buildings and structural failure risk. I am a bit new to SAS and statistical modelling, so apologies if this is a poor question.
data have;
input Buildtype $ RoofType $ Failure $;
datalines;
Detached Mansard No
Detached Flat Yes
Senidetached Pitched No
Apartment Flat No
;
I am interested in the probability of failure of different building types and roofs, and the uncertainty estimates of this probability. So far, I have been using proc logistic:
proc logistic data=have;
class Buildtype RoofType / param=ref ;
model Failure (event='Yes') = Buildtype RoofType;
Output out=want lower=lower upper=upper predicted=predicted;
run;
This outputs:
Printed odds ratios and maximum likelihood estimates relative to a reference value. This is useful, but I would like to show a risk of failure relative to the average for all dwellings, if that makes sense.
Average for all dwellings generally would not make sense in this case unless the data/experiment was completely balanced, so each buildtype*rooftype occurs an equal number of times. Usually, the comparison is indeed the levels of buildtype to each other and the levels of rooftype to each other (and if desired, the interaction levels as well compared to each other)
A dataset want that is the same as the have dataset, with predicted failures and confidence intervals added as columns at the end. This is what I want, but is it possible just to get the list of independent variables and their predictions, instead of the entire dataset? The original dataset is quite large...
I think what you want is the output from the LSMEANS statement with the ILINK option.
Thanks to everyone for their really quick replies - they are incredibly helpful.
I think odds ratios are probably a good option here, so let's go for that.
I was hoping to output a table with just the dependent variables and their odds ratios, since I find it a bit easier to customise the charts using sgplot for example. I can output the odds ratios by using this statement:
ods output OddsRatiosWald= ORPlot;
But I am not sure how to do the same if I want to use PL insteads of Wald?
Thanks,
Jon
is it possible just to get the list of independent variables and their predictions
Take a look at EFFECTPLOT or SLICES but I think odds ratio for each variable is essentially telling you what you want to know.
Have you walked through this example in the tutorials?
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/statug/statug_logistic_examples02.htm
Hello,
You can use a KEEP= data set option on the WANT dataset to specify the columns you want to keep:
Like:
output out=want(keep=Failure Buildtype RoofType lower upper predicted)
lower=lower upper=upper predicted=predicted;
Koen
There are two basic statistics for this - LS-means (as mentioned earlier) and predictive margins. LS-means can be obtained for any categorical predictor which is specified in the CLASS statement. Note that you need to specify the PARAM=GLM option in the CLASS statement in order to use the LSMEANS statement. The following provides estimates the event probabilities and confidence intervals for each level of each predictor while holding the other predictor constant. The ILINK option gives the estimates on the mean (probability) scale, the CL option gives the confidence limits, and the E option shows the coefficients on the parameters that define each LS-mean and allows you to see how the other predictor(s) are fixed.
class Buildtype RoofType / param=glm;
model Failure (event='Yes') = Buildtype RoofType;
lsmeans Buildtype RoofType / ilink cl e;
Margins for a predictor do not hold the other predictor(s) constant but rather averages the predicted values. Predictive margins can be obtained for categorical predictors and marginal effects for continuous predictors. These are provided by the Margins macro. See the discussion and examples in its documentation.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.