BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nvcarroll54
Calcite | Level 5

Can anyone provide or direct me to SAS code for two part models in GLM.  I am analyzing cost differences between patients who have experienced a poisoning and those who have not.  There are lots of zeroes in the data and it is skewed. From what I've read, the first part of the model would be a logistic regression and the second would be a regression with a gamma distribution and log link.  I am especially interested in how to combine the results of the two regressions to develop estimates of cost differences between the two groups.

1 ACCEPTED SOLUTION

Accepted Solutions
Funda_SAS
SAS Employee

Try using PROC FMM:

 

Your SAS code should look something like this:

 

proc fmm data=rowdata;

  model response = age income avgexp / dist=binary;

   model response = age income avgexp / dist=gamma;

   model response = / dist=constant; 

run;

 

The last MODEL statement specifies a constant distribution with all mass at zero for the zero target group.

 

To understand PROC FMM and finite mixture models take a look at:

Funda

 

 

 

View solution in original post

6 REPLIES 6
Reeza
Super User

To get you started see the papers at Lexjansen.com

 

http://lexjansen.com/search/searchresults.php?q=two%20stage%20model

 

FYI - This is a good place to start research on any SAS topic 🙂

Funda_SAS
SAS Employee

Try using PROC FMM:

 

Your SAS code should look something like this:

 

proc fmm data=rowdata;

  model response = age income avgexp / dist=binary;

   model response = age income avgexp / dist=gamma;

   model response = / dist=constant; 

run;

 

The last MODEL statement specifies a constant distribution with all mass at zero for the zero target group.

 

To understand PROC FMM and finite mixture models take a look at:

Funda

 

 

 

SlutskyFan
Obsidian | Level 7

So I have a flag for claims > 0 (CLAIM_PRE) and a continuous variable for claims (MEDRX_PRE). It would seem that in a two part model the first modle would be predicting CLAIM_PRE (the separate process for 1'sand 0's) and the next model statement for the positive claims. I am not sure how to set this up in the FMM framework. When I enter the following I get lots of errors about conflicting outcome/model statements.

 

PROC FMM DATA=WORK.HC_MBR_ANLY_23OCT17;
MODEL MEDRX_PRE = ACTIV_PARTICIP / DIST=BINARY;
MODEL MEDRX_PRE = ACTIV_PARTICIP / DIST=GAMMA;
MODEL CLAIM_PRE = / DIST=CONSTANT;
RUN;

 

Should the data be structured in some way or the PROC specified so that FMM can handle both the binary model to predict occurance of claims and the conditonal on positive gamma model?  I don't see how the same response variable can be used in each model statement if the entire purpose of two part models is to model two different responses? 

 

Thanks.

sasuser2017
Calcite | Level 5

nvcarroll54, were you able to find the code for this?

 

Thanks

nvcarroll54
Calcite | Level 5
Nothing simple enough for me to understand and use

##- Please type your reply above this line. Simple formatting, no
attachments. -##
sasuser2017
Calcite | Level 5

I know what you mean... i've been trying to figure this out for over a month now.

 

Turns out, it's way easier to do it in Stata, but my dataset is way to large for Stata to handle, so I've had to resort back to SAS.

 

If your dataset is not very large, try running this in Stata. You have to install command twopm, and it's literally just few lines of code. 

 

Goodluck,

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 6656 views
  • 5 likes
  • 5 in conversation