Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- SAS "proc" for linear regression

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-07-2019 08:53 PM
(1302 views)

Hi

I have a dataset with variables' observations on different dates for 4000 unique hospitals (please see the sample dataset in code box). I want to run a** linear regression model** with **year and hospital fixed effects**. The model looks as follows:

**model disease = gender age weight distance temprature gender_job**

where "disease" "gender" and "gender_job" are **dummy variables**, observation values are equal to 1 when gender is Female, disease Exists and that gender has a job. And 0 otherwise. Also, age weight distance temprature gender_job are **control variables**.

Considering the fact that Y and some X variables are binary, "proc reg" may not give valid results. As mentioned, I need to apply **two-way fixed effects**.

Kindly suggest which SAS proc I must use to run regression for this dataset.

data have ; infile datalines dlm="," missover DSD; input hospital_ID : $5. date : mmddyy10. disease gender age weight distance temprature gender_job ; format date mmddyy10. ; datalines ; aa000,11/03/2005,0,0,25,70,1,27,. aa000,01/25/2007,1,0,65,95,2,20,1 aa000,06/15/2007,1,0,48,100,.,40,0 aa000,09/11/2008,0,1,30,65,2.5,30,1 ab000,03/10/2010,1,1,40,75,1,15,1 ab000,12/30/2010,0,1,19,55,0.5,5,0 ac000,09/09/2004,0,0,17,60,1.5,.,0 ac000,09/09/2004,1,0,40,70,3,30,0 ac000,09/09/2004,1,1,29,69,2.2,30,1 ac000,05/03/2006,0,0,31,90,1,25,1 ; run;

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

A logistic model would more likely be more appropriate for your problem. Proc LOGISTIC would be the tool of choice. It supports continuous and nominal effects, without the need to create dummy variables (it creates them for you). A logistic model would estimate the probability that disease=1, as a function of the value of independent variables.

PG

9 REPLIES 9

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi

I have a dataset with variables' observations on different dates for 4000 unique hospitals (please see the sample dataset in code box). I want to run a** linear regression model** with **year and hospital fixed effects**. The model looks as follows:

**model disease = gender age weight distance temperature gender_job**

where "disease" "gender" and "gender_job" are **dummy variables**, observation values are equal to 1 when gender is Female, disease Exists and that gender has a job. And 0 otherwise. Also, age weight distance temperature gender_job are **control variables**.

Considering the fact that Y and some X variables are binary, "proc reg" may not give valid results. As mentioned, I need to apply **two-way fixed effects**.

Kindly suggest which SAS proc I must use to run a regression for this dataset.

data have ; infile datalines dlm="," missover DSD; input hospital_ID : $5. date : mmddyy10. disease gender age weight distance temperature gender_job ; format date mmddyy10. ; datalines ; aa000,11/03/2005,0,0,25,70,1,27,. aa000,01/25/2007,1,0,65,95,2,20,1 aa000,06/15/2007,1,0,48,100,.,40,0 aa000,09/11/2008,0,1,30,65,2.5,30,1 ab000,03/10/2010,1,1,40,75,1,15,1 ab000,12/30/2010,0,1,19,55,0.5,5,0 ac000,09/09/2004,0,0,17,60,1.5,.,0 ac000,09/09/2004,1,0,40,70,3,30,0 ac000,09/09/2004,1,1,29,69,2.2,30,1 ac000,05/03/2006,0,0,31,90,1,25,1 ; run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

A logistic model would more likely be more appropriate for your problem. Proc LOGISTIC would be the tool of choice. It supports continuous and nominal effects, without the need to create dummy variables (it creates them for you). A logistic model would estimate the probability that disease=1, as a function of the value of independent variables.

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@PGStats: Thanks for your suggestion. But in my case independent variable (i.e.gender) is a dummy too. Also, how does Proc logistic deal with two way fixed effects? I shall be thankful if you kindly share an appropriate code.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You need to know about logistic models before trying to fit them. Logistic models are usually covered in intermediate courses about statistical analysis.

When you hear that a certain factor increases the risk of developping a disease by so many percents, they are generally referring to the result of a logistic model analysis.

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@PGStats: Thanks. I shall be grateful if you kindly refer me to a relevant reading. As I am dealing with a financial dataset as well, where the dependent variable is a dummy. If trade takes place, the value is 1, and 0 otherwise. Therefore, I need to know the basic code as a starter. Thanks

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I had to merge your questions.

PLEASE DO NOT DOUBLE-POST!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

A very similar question was posted by this user at

https://communities.sas.com/t5/Statistical-Procedures/Wald-test-for-proc-glm/m-p/541224#M27136

It seems that the OP is confused about the relationship between the CLASS statement and dummy variables. There is s SAS NOTE about CLASS variables.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.