BookmarkSubscribeRSS Feed
michellel
Calcite | Level 5

Hi,

I am looking into some cost (Dependent Variable) for three (3) different procedure groups (categorical independent variable) and plan to present the mean of predicted cost for each of these three procedure groups. The independent variables will be procedure group, comorbidity, and variables for some socio demographic information. I used generalized linear model (proc genmod) with gamma distribution and log link. I learned SAS code for GLM model and wrote my SAS code as the following. I am not sure if I wrote SAS code in a right way to get result. Please correct me if anything is wrong. Thanks so much!

(1) I put all categorical variables in the CLASS statement;

(2) I put type3 in the MODEL statement to see if there is difference of cost among 3 procedure groups;

     From results, it shows table of LR Statistics For Type 3 Analysis. I will go check the p-value for variable procedure_cat to see if there is significant different in the cost for these 3 procedure groups. Is it correct?

(3) I used LSMEANS statement to have mean of predicted payment for each of procedure group, which is why I put variable procedure_cat after LSMEANS statement;

     At the bottom of results, it shows procedure_cat Least Squares Means. I will present the mean for each of procedure groups in my table for the mean of predicted cost for each of procedure groups. Is it correct?

(4) I put / ilink after LSMEANS statement as I saw someone suggested to do so since link=log was already defined, but I am still not clear about the reason of using / ilink;

Proc genmod data=abc.mydata;

                Class  procedure_cat  female  race_cat  income_cat  comorbidity_cat;

                Model  payment = procedure_cat  age female race_cat  college  income_cat comorbidity_cat / dist=gamma link=log type3;

               lsmeans procedure_cat / ilink;

Run;

6 REPLIES 6
SteveDenham
Jade | Level 19

This should all work.  Note that the estimates from your LSMEANS statement will be marginal means, averaged over all of the other categories.  You may have interactions to think about--for instance if the procedure_cat coded for mammogram, there is a high likelihood of interaction with the categorical variable female (assuming it is a Y/N variable).  Looking at marginal means over all categories gives equal weight to each level of each of the categorical variables.

Steve Denham

michellel
Calcite | Level 5

Thanks Steve! Your comments are very useful!

Is the marginal means the common way to show mean of predicted y hat or do you have any other idea to present the analysis result? Can I use output statement with prob= to get y hat, and then calculate mean of y hat for each of procedure group like the following code?

Proc genmod data=abc.mydata;

                Class  procedure_cat  female  race_cat  income_cat  comorbidity_cat;

                Model  payment = procedure_cat  age female race_cat  college  income_cat comorbidity_cat / dist=gamma link=log type3;

               lsmeans procedure_cat / ilink;

               output out=resut  prob=p;

Run;

proc means n mean stddev CLM data=result;

     class procedure_cat;

     var p;

run;

I also have another question about the analysis result. In "analysis of maximum likelihood parameter estimates" table of analysis result, they are missing in the column of wald chi-square and the column of Pr>ChiSq for the last group of each categorical variable (The DF for each of last group of each categorical variables is 0), like the table below. Why is that?

Intercept 1-1.31680.0903-1.4937-1.1398212.73<.0001
carlarge1-1.76430.2724-2.2981-1.230441.96<.0001
carmedium1-0.69280.1282-0.9441-0.441429.18<.0001
carsmall00.00000.00000.00000.0000..
age11-1.31990.1359-1.5863-1.053694.34<.0001
age200.00000.00000.00000.0000..
Scale 01.00000.00001.00001.0000

Thanks so much for your answer!

SteveDenham
Jade | Level 19

Last question first.  The estimates are set to zero as these are the reference categories (default is last).  An overparameterized model is fit.  Solutions involve putting together the estimates into a linear form.  For instance, the estimate for a large car for age group=1 would be intercept + estimate (car large) + estimate (age 1) = -1.3168 - 1.7643 - 1.3199 = -4.401 on the linearized scale (for your major project, that would be the log scale).

Now as far as the first question, there really shouldn't be a need to recalculate the mean probability.  By using the ilink option in the lsmeans statement, you will get the probability (actually a risk, I think) for each level of procedure_cat.  And this may well be different from the value you obtain from the output dataset, followed by proc means, because the assumption here is that the probabilities are additive on the log scale, hence the mean is calculated and then put back onto the original scale with the ilink function.  To get something close using the output dataset, I think you will have to apply a log transformation to p, get the mean, and then backtransform to the original scale.

Steve Denham

michellel
Calcite | Level 5

Thanks Steve! I have one more question about the scale shown at the bottom of model result. The result shows the scale parameter was estimated by maximum likelihood, but I am not sure how to interprete the scale. Could you give me some hints? Thanks so much!

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

The gamma distribution has two parameters, mu (which may be a function of covariates and treatments, and associated parameters) and a scale parameter. The scale parameter is like a standard deviation or variance (but not quite); at least it serves that purpose (a measure of variability, similar to sigma with normal data). But with the gamma distribution, the variance is a function of the mean. var(Y) = scale*(mu^2). 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2894 views
  • 5 likes
  • 3 in conversation