BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Saba1
Quartz | Level 8

Hi

 

I have a dataset with variables' observations on different dates for 4000 unique hospitals (please see the sample dataset in code box). I want to run a linear regression model with year and hospital fixed effects. The model looks as follows:

 

model disease = gender age weight distance temprature gender_job

 

where "disease" "gender" and "gender_job" are dummy variables, observation values are equal to  1 when gender is Female, disease Exists and that gender has a job. And 0 otherwise. Also, age weight distance temprature gender_job are control variables.

 

Considering the fact that Y and some X variables are binary, "proc reg" may not give valid results. As mentioned, I need to apply two-way fixed effects

 

Kindly suggest which SAS proc I must use to run regression for this dataset.

 

data have ;
infile datalines
dlm=","
missover
DSD;
input hospital_ID : $5. date  : mmddyy10. disease gender age weight distance temprature gender_job ;
format date mmddyy10. ;
datalines ;
aa000,11/03/2005,0,0,25,70,1,27,.
aa000,01/25/2007,1,0,65,95,2,20,1
aa000,06/15/2007,1,0,48,100,.,40,0
aa000,09/11/2008,0,1,30,65,2.5,30,1
ab000,03/10/2010,1,1,40,75,1,15,1
ab000,12/30/2010,0,1,19,55,0.5,5,0
ac000,09/09/2004,0,0,17,60,1.5,.,0
ac000,09/09/2004,1,0,40,70,3,30,0
ac000,09/09/2004,1,1,29,69,2.2,30,1
ac000,05/03/2006,0,0,31,90,1,25,1
;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

A logistic model would more likely be more appropriate for your problem. Proc LOGISTIC would be the tool of choice. It supports continuous and nominal effects, without the need to create dummy variables (it creates them for you). A logistic model would estimate the probability that disease=1, as a function of the value of independent variables.

PG

View solution in original post

9 REPLIES 9
Saba1
Quartz | Level 8

Hi

 

I have a dataset with variables' observations on different dates for 4000 unique hospitals (please see the sample dataset in code box). I want to run a linear regression model with year and hospital fixed effects. The model looks as follows:

 

model  disease = gender age weight distance temperature gender_job

 

where "disease" "gender" and "gender_job" are dummy variables, observation values are equal to  1 when gender is Female, disease Exists and that gender has a job. And 0 otherwise. Also, age weight distance temperature gender_job are control variables.

 

Considering the fact that Y and some X variables are binary, "proc reg" may not give valid results. As mentioned, I need to apply two-way fixed effects

 

Kindly suggest which SAS proc I must use to run a regression for this dataset.

 

data have ;
infile datalines
dlm=","
missover
DSD;
input hospital_ID : $5. date  : mmddyy10. disease gender age weight distance temperature gender_job ;
format date mmddyy10. ;
datalines ;
aa000,11/03/2005,0,0,25,70,1,27,.
aa000,01/25/2007,1,0,65,95,2,20,1
aa000,06/15/2007,1,0,48,100,.,40,0
aa000,09/11/2008,0,1,30,65,2.5,30,1
ab000,03/10/2010,1,1,40,75,1,15,1
ab000,12/30/2010,0,1,19,55,0.5,5,0
ac000,09/09/2004,0,0,17,60,1.5,.,0
ac000,09/09/2004,1,0,40,70,3,30,0
ac000,09/09/2004,1,1,29,69,2.2,30,1
ac000,05/03/2006,0,0,31,90,1,25,1
;
run;
PGStats
Opal | Level 21

A logistic model would more likely be more appropriate for your problem. Proc LOGISTIC would be the tool of choice. It supports continuous and nominal effects, without the need to create dummy variables (it creates them for you). A logistic model would estimate the probability that disease=1, as a function of the value of independent variables.

PG
Saba1
Quartz | Level 8
@PGStats: Thanks for your suggestion. But in my case independent variable (i.e.gender) is a dummy too. Also, how does Proc logistic deal with two way fixed effects? I shall be thankful if you kindly share an appropriate code.
PGStats
Opal | Level 21

You need to know about logistic models before trying to fit them. Logistic models are usually covered in intermediate courses about statistical analysis.

When you hear that a certain factor increases the risk of developping a disease by so many percents, they are generally referring to the result of a logistic model analysis.

PG
Saba1
Quartz | Level 8
@PGStats: Thanks. I shall be grateful if you kindly refer me to a relevant reading. As I am dealing with a financial dataset as well, where the dependent variable is a dummy. If trade takes place, the value is 1, and 0 otherwise. Therefore, I need to know the basic code as a starter. Thanks
Ksharp
Super User

Calling @Rick_SAS  @StatDave 

 

Honestly, I don't understand your question very well.

Maybe you need try PROC GEE or PROC GENMOD for GEE model .

Rick_SAS
SAS Super FREQ

A very similar question was posted by this user at 

https://communities.sas.com/t5/Statistical-Procedures/Wald-test-for-proc-glm/m-p/541224#M27136

 

It seems that the OP is confused about the relationship between the CLASS statement and dummy variables. There is s SAS NOTE about CLASS variables.

StatDave
SAS Super FREQ

As mentioned, you probably need to fit a logistic GEE model. You can do this in PROC GEE with a REPEATED statement and the DIST=BINOMIAL option in the MODEL statement. In the SUBJECT= option of the REPEATED statement, you should specify a variable that has a distinct value for each set of correlated observation (possibly hospitals in your case). See this note for details on the SUBJECT= effect. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1156 views
  • 4 likes
  • 6 in conversation