Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Multiple linear regressions within a dataset

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-06-2018 08:33 AM
(2012 views)

Hi everyone,

I am using SAS University Edition and have a question regarding a regression analysis, which is probably easy to solve but I am new to SAS and did not found a particular solution for this (probably because I had not a real clue of how to find this).

I have a dataset which looks like this:

Year Stock_Identifier Y_Var X_Var1

2005 1 0,3 0,1

2006 1 0,4 0,2

2007 1 0,5 0,15

2008 1 0,6 0,25

2005 2 0,3 0,3

2006 2 0,3 0,4

2007 2 0,5 0,4

2008 2 0,4 0,5

What I need is a linear regression which tells me the correlation between Y and X1 (in the real dataset, I have some more X-Var but that should not be a problem). A normal linear regression would, I think, ignore the stock identifiers and just compare Y and X. That's where I need you. The regression does only make sense at the level of the stock, so in this case there should be one regression for stock 1 and its data points between 2005 and 2008 and the next one for the second stock. At the end, however, I need a "normal" regression output table aggregated at the level of the whole dataset.

I hope that this is clear and maybe it's a stupid question (sorry for that), but I am really thankful for your input (optimally you could even explain me the steps you take, because I am really new to SAS :-)).

Kind regards

9 REPLIES 9

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

```
proc sort data=have;
by stock_identifier;
run;
proc reg data=have;
by stock_identifier;
model y_var = x_var1;
run;
quit;
```

I did not include a response to this part of your question:

At the end, however, I need a "normal" regression output table aggregated at the level of the whole dataset.

because I'm not sure what you mean.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Follow PaigeMiller's advice, but also use the TABLEOUT OUTEST= option on the PROC REG statement:

proc reg data=have TABLEOUT outest=RegOut;

...

quit;

The dataset RegOut contains the parameter estimates, standard errors, p-values, and 95% CIs. Here is an example:

```
proc sort data=sashelp.class out=Have;
by sex;
run;
proc reg data=have outest=RegOut TABLEOUT plots=none;
by sex;
model height = weight;
run;
quit;
proc print data=RegOut;
var Sex _TYPE_ Intercept Weight;
run;
/* to output only some statistics, use a WHERE clause */
proc print data=RegOut;
where _TYPE_="PARMS";
var Sex _TYPE_ Intercept Weight;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Try using a Task and it will generate the code.

Place stick in the GROUP ANALYSIS BY section.

As to how to combine them into one output, what’s the math behind that?

@svw1900 wrote:

Hi everyone,

I am using SAS University Edition and have a question regarding a regression analysis, which is probably easy to solve but I am new to SAS and did not found a particular solution for this (probably because I had not a real clue of how to find this).

I have a dataset which looks like this:

Year Stock_Identifier Y_Var X_Var1

2005 1 0,3 0,1

2006 1 0,4 0,2

2007 1 0,5 0,15

2008 1 0,6 0,25

2005 2 0,3 0,3

2006 2 0,3 0,4

2007 2 0,5 0,4

2008 2 0,4 0,5

What I need is a linear regression which tells me the correlation between Y and X1 (in the real dataset, I have some more X-Var but that should not be a problem). A normal linear regression would, I think, ignore the stock identifiers and just compare Y and X. That's where I need you. The regression does only make sense at the level of the stock, so in this case there should be one regression for stock 1 and its data points between 2005 and 2008 and the next one for the second stock. At the end, however, I need a "normal" regression output table aggregated at the level of the whole dataset.

I hope that this is clear and maybe it's a stupid question (sorry for that), but I am really thankful for your input (optimally you could even explain me the steps you take, because I am really new to SAS :-)).

Kind regards

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

What do you expect about the relation between *Y* and *X1* for different stocks? Do you expect them to have a common slope but different intercepts? As in* Y = B0s + B1*X1*, where the *B0s* are a stock-specific intercepts. This kind of relationship can (and should) be fitted with a single regression.

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@PGStats wrote:

What do you expect about the relation between

YandX1for different stocks? Do you expect them to have a common slope but different intercepts? As inY = B0s + B1*X1, where theB0sare a stock-specific intercepts. This kind of relationship can (and should) be fitted with a single regression.

Excellent point!

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Ok Maybe I'm completely wrong with what I thought.

So what I Need is the correlation between y_var and all the x_variables in General. In a normal Regression, you would therefore compare all the data from all columns an receive the correlation, intercepts, parameters for x_variables etc. In my case, however, a normal regression at the level of the overall dataset would not make sense because the y_variable is Clustered in subgroups (the stocks), right? That's Why I thought I need multiple regressions, one for every Stock and its time-series.

Because of that I thought that the Output would be a set of Regression results, one for each subgroup. But that would not be what I Need at the end, since I need a general Information about the correlation between y and all the x's. Something like an aggregated correlation over all subgroups.

Maybe I am completely wrong in any of these points, in this Case Thanks for clarification.

Kind regards

So what I Need is the correlation between y_var and all the x_variables in General. In a normal Regression, you would therefore compare all the data from all columns an receive the correlation, intercepts, parameters for x_variables etc. In my case, however, a normal regression at the level of the overall dataset would not make sense because the y_variable is Clustered in subgroups (the stocks), right? That's Why I thought I need multiple regressions, one for every Stock and its time-series.

Because of that I thought that the Output would be a set of Regression results, one for each subgroup. But that would not be what I Need at the end, since I need a general Information about the correlation between y and all the x's. Something like an aggregated correlation over all subgroups.

Maybe I am completely wrong in any of these points, in this Case Thanks for clarification.

Kind regards

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@svw1900 wrote:

Ok Maybe I'm completely wrong with what I thought.

So what I Need is the correlation between y_var and all the x_variables in General. In a normal Regression, you would therefore compare all the data from all columns an receive the correlation, intercepts, parameters for x_variables etc. In my case, however, a normal regression at the level of the overall dataset would not make sense because the y_variable is Clustered in subgroups (the stocks), right? That's Why I thought I need multiple regressions, one for every Stock and its time-series.

Because of that I thought that the Output would be a set of Regression results, one for each subgroup. But that would not be what I Need at the end, since I need a general Information about the correlation between y and all the x's. Something like an aggregated correlation over all subgroups.

Maybe I am completely wrong in any of these points, in this Case Thanks for clarification.

Kind regards

Do you agree with @PGStats that you want different intercepts for each stock, but common slope??? Or do you want different slopes for each stock?

You also now are speaking clearly about multiple X variables, which was absent from your original example which contained only a single X variable. Which would then translate to different intercepts for each stock, but common slopes (plural) for each X. Is that what you want?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Sorry, seems like I Needed some Moments to understand the point Here. Yeah, I only expect different intercepts so a single Regression will probably fit for this case, you're right. Sorry for the confusion!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

```
proc glm data=have;
class stock_identifier;
/* add as many X variables as you need here */
model y = stock_identifier x1 x2 x3 /noint;
run;
quit;
```

--

Paige Miller

Paige Miller

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.