Solved: SAS "proc" for linear regression

Saba1 · Posted 03-07-2019 08:53 PM

Hi

I have a dataset with variables' observations on different dates for 4000 unique hospitals (please see the sample dataset in code box). I want to run a linear regression model with year and hospital fixed effects. The model looks as follows:

model disease = gender age weight distance temprature gender_job

where "disease" "gender" and "gender_job" are dummy variables, observation values are equal to 1 when gender is Female, disease Exists and that gender has a job. And 0 otherwise. Also, age weight distance temprature gender_job are control variables.

Considering the fact that Y and some X variables are binary, "proc reg" may not give valid results. As mentioned, I need to apply two-way fixed effects.

Kindly suggest which SAS proc I must use to run regression for this dataset.

data have ;
infile datalines
dlm=","
missover
DSD;
input hospital_ID : $5. date  : mmddyy10. disease gender age weight distance temprature gender_job ;
format date mmddyy10. ;
datalines ;
aa000,11/03/2005,0,0,25,70,1,27,.
aa000,01/25/2007,1,0,65,95,2,20,1
aa000,06/15/2007,1,0,48,100,.,40,0
aa000,09/11/2008,0,1,30,65,2.5,30,1
ab000,03/10/2010,1,1,40,75,1,15,1
ab000,12/30/2010,0,1,19,55,0.5,5,0
ac000,09/09/2004,0,0,17,60,1.5,.,0
ac000,09/09/2004,1,0,40,70,3,30,0
ac000,09/09/2004,1,1,29,69,2.2,30,1
ac000,05/03/2006,0,0,31,90,1,25,1
;
run;

PGStats · Posted 03-08-2019 12:17 AM

A logistic model would more likely be more appropriate for your problem. Proc LOGISTIC would be the tool of choice. It supports continuous and nominal effects, without the need to create dummy variables (it creates them for you). A logistic model would estimate the probability that disease=1, as a function of the value of independent variables.

PG

View solution in original post

Saba1 · Posted 03-07-2019 09:13 PM

Hi

I have a dataset with variables' observations on different dates for 4000 unique hospitals (please see the sample dataset in code box). I want to run a linear regression model with year and hospital fixed effects. The model looks as follows:

model disease = gender age weight distance temperature gender_job

where "disease" "gender" and "gender_job" are dummy variables, observation values are equal to 1 when gender is Female, disease Exists and that gender has a job. And 0 otherwise. Also, age weight distance temperature gender_job are control variables.

Considering the fact that Y and some X variables are binary, "proc reg" may not give valid results. As mentioned, I need to apply two-way fixed effects.

Kindly suggest which SAS proc I must use to run a regression for this dataset.

data have ;
infile datalines
dlm=","
missover
DSD;
input hospital_ID : $5. date  : mmddyy10. disease gender age weight distance temperature gender_job ;
format date mmddyy10. ;
datalines ;
aa000,11/03/2005,0,0,25,70,1,27,.
aa000,01/25/2007,1,0,65,95,2,20,1
aa000,06/15/2007,1,0,48,100,.,40,0
aa000,09/11/2008,0,1,30,65,2.5,30,1
ab000,03/10/2010,1,1,40,75,1,15,1
ab000,12/30/2010,0,1,19,55,0.5,5,0
ac000,09/09/2004,0,0,17,60,1.5,.,0
ac000,09/09/2004,1,0,40,70,3,30,0
ac000,09/09/2004,1,1,29,69,2.2,30,1
ac000,05/03/2006,0,0,31,90,1,25,1
;
run;

PGStats · Posted 03-08-2019 12:17 AM

A logistic model would more likely be more appropriate for your problem. Proc LOGISTIC would be the tool of choice. It supports continuous and nominal effects, without the need to create dummy variables (it creates them for you). A logistic model would estimate the probability that disease=1, as a function of the value of independent variables.

PG

Saba1 · Posted 03-08-2019 12:23 AM

@PGStats: Thanks for your suggestion. But in my case independent variable (i.e.gender) is a dummy too. Also, how does Proc logistic deal with two way fixed effects? I shall be thankful if you kindly share an appropriate code.

PGStats · Posted 03-08-2019 12:44 AM

You need to know about logistic models before trying to fit them. Logistic models are usually covered in intermediate courses about statistical analysis.

When you hear that a certain factor increases the risk of developping a disease by so many percents, they are generally referring to the result of a logistic model analysis.

PG

Saba1 · Posted 03-08-2019 01:08 AM

@PGStats: Thanks. I shall be grateful if you kindly refer me to a relevant reading. As I am dealing with a financial dataset as well, where the dependent variable is a dummy. If trade takes place, the value is 1, and 0 otherwise. Therefore, I need to know the basic code as a starter. Thanks

Kurt_Bremser · Posted 03-08-2019 12:27 AM

I had to merge your questions.

PLEASE DO NOT DOUBLE-POST!

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Ksharp · Posted 03-08-2019 05:59 AM

Calling @Rick_SAS @StatDave

Honestly, I don't understand your question very well.

Maybe you need try PROC GEE or PROC GENMOD for GEE model .

Rick_SAS · Posted 03-08-2019 06:19 AM

A very similar question was posted by this user at

https://communities.sas.com/t5/Statistical-Procedures/Wald-test-for-proc-glm/m-p/541224#M27136

It seems that the OP is confused about the relationship between the CLASS statement and dummy variables. There is s SAS NOTE about CLASS variables.

StatDave · Posted 03-08-2019 10:41 AM

As mentioned, you probably need to fit a logistic GEE model. You can do this in PROC GEE with a REPEATED statement and the DIST=BINOMIAL option in the MODEL statement. In the SUBJECT= option of the REPEATED statement, you should specify a variable that has a distinct value for each set of correlated observation (possibly hospitals in your case). See this note for details on the SUBJECT= effect.

SAS "proc" for linear regression

Re: SAS "proc" for linear regression

SAS "proc" for linear regression

Re: SAS "proc" for linear regression

Re: SAS "proc" for linear regression

Re: SAS "proc" for linear regression

Re: SAS "proc" for linear regression

Re: SAS "proc" for linear regression

Re: SAS "proc" for linear regression

Re: SAS "proc" for linear regression

Re: SAS "proc" for linear regression