Hi
I have a dataset with variables' observations on different dates for 4000 unique hospitals (please see the sample dataset in code box). I want to run a linear regression model with year and hospital fixed effects. The model looks as follows:
model disease = gender age weight distance temprature gender_job
where "disease" "gender" and "gender_job" are dummy variables, observation values are equal to 1 when gender is Female, disease Exists and that gender has a job. And 0 otherwise. Also, age weight distance temprature gender_job are control variables.
Considering the fact that Y and some X variables are binary, "proc reg" may not give valid results. As mentioned, I need to apply two-way fixed effects.
Kindly suggest which SAS proc I must use to run regression for this dataset.
data have ; infile datalines dlm="," missover DSD; input hospital_ID : $5. date : mmddyy10. disease gender age weight distance temprature gender_job ; format date mmddyy10. ; datalines ; aa000,11/03/2005,0,0,25,70,1,27,. aa000,01/25/2007,1,0,65,95,2,20,1 aa000,06/15/2007,1,0,48,100,.,40,0 aa000,09/11/2008,0,1,30,65,2.5,30,1 ab000,03/10/2010,1,1,40,75,1,15,1 ab000,12/30/2010,0,1,19,55,0.5,5,0 ac000,09/09/2004,0,0,17,60,1.5,.,0 ac000,09/09/2004,1,0,40,70,3,30,0 ac000,09/09/2004,1,1,29,69,2.2,30,1 ac000,05/03/2006,0,0,31,90,1,25,1 ; run;
A logistic model would more likely be more appropriate for your problem. Proc LOGISTIC would be the tool of choice. It supports continuous and nominal effects, without the need to create dummy variables (it creates them for you). A logistic model would estimate the probability that disease=1, as a function of the value of independent variables.
Hi
I have a dataset with variables' observations on different dates for 4000 unique hospitals (please see the sample dataset in code box). I want to run a linear regression model with year and hospital fixed effects. The model looks as follows:
model disease = gender age weight distance temperature gender_job
where "disease" "gender" and "gender_job" are dummy variables, observation values are equal to 1 when gender is Female, disease Exists and that gender has a job. And 0 otherwise. Also, age weight distance temperature gender_job are control variables.
Considering the fact that Y and some X variables are binary, "proc reg" may not give valid results. As mentioned, I need to apply two-way fixed effects.
Kindly suggest which SAS proc I must use to run a regression for this dataset.
data have ; infile datalines dlm="," missover DSD; input hospital_ID : $5. date : mmddyy10. disease gender age weight distance temperature gender_job ; format date mmddyy10. ; datalines ; aa000,11/03/2005,0,0,25,70,1,27,. aa000,01/25/2007,1,0,65,95,2,20,1 aa000,06/15/2007,1,0,48,100,.,40,0 aa000,09/11/2008,0,1,30,65,2.5,30,1 ab000,03/10/2010,1,1,40,75,1,15,1 ab000,12/30/2010,0,1,19,55,0.5,5,0 ac000,09/09/2004,0,0,17,60,1.5,.,0 ac000,09/09/2004,1,0,40,70,3,30,0 ac000,09/09/2004,1,1,29,69,2.2,30,1 ac000,05/03/2006,0,0,31,90,1,25,1 ; run;
A logistic model would more likely be more appropriate for your problem. Proc LOGISTIC would be the tool of choice. It supports continuous and nominal effects, without the need to create dummy variables (it creates them for you). A logistic model would estimate the probability that disease=1, as a function of the value of independent variables.
You need to know about logistic models before trying to fit them. Logistic models are usually covered in intermediate courses about statistical analysis.
When you hear that a certain factor increases the risk of developping a disease by so many percents, they are generally referring to the result of a logistic model analysis.
I had to merge your questions.
PLEASE DO NOT DOUBLE-POST!
A very similar question was posted by this user at
https://communities.sas.com/t5/Statistical-Procedures/Wald-test-for-proc-glm/m-p/541224#M27136
It seems that the OP is confused about the relationship between the CLASS statement and dummy variables. There is s SAS NOTE about CLASS variables.
As mentioned, you probably need to fit a logistic GEE model. You can do this in PROC GEE with a REPEATED statement and the DIST=BINOMIAL option in the MODEL statement. In the SUBJECT= option of the REPEATED statement, you should specify a variable that has a distinct value for each set of correlated observation (possibly hospitals in your case). See this note for details on the SUBJECT= effect.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.