BookmarkSubscribeRSS Feed
svw1900
Obsidian | Level 7

Hi everyone,

 

I am using SAS University Edition and have a question regarding a regression analysis, which is probably easy to solve but I am new to SAS and did not found a particular solution for this (probably because I had not a real clue of how to find this).

 

I have a dataset which looks like this:

 

Year      Stock_Identifier       Y_Var       X_Var1

2005              1                       0,3            0,1

2006              1                       0,4            0,2

2007              1                       0,5            0,15

2008              1                       0,6            0,25

2005              2                       0,3            0,3

2006              2                       0,3            0,4

2007              2                       0,5            0,4

2008              2                       0,4            0,5

 

What I need is a linear regression which tells me the correlation between Y and X1 (in the real dataset, I have some more X-Var but that should not be a problem). A normal linear regression would, I think, ignore the stock identifiers and just compare Y and X. That's where I need you. The regression does only make sense at the level of the stock, so in this case there should be one regression for stock 1 and its data points between 2005 and 2008 and the next one for the second stock. At the end, however, I need a "normal" regression output table aggregated at the level of the whole dataset.

 

I hope that this is clear and maybe it's a stupid question (sorry for that), but I am really thankful for your input (optimally you could even explain me the steps you take, because I am really new to SAS :-)).

 

Kind regards

9 REPLIES 9
PaigeMiller
Diamond | Level 26
proc sort data=have;
    by stock_identifier;
run;
proc reg data=have;
    by stock_identifier;
    model y_var = x_var1;
run;
quit;

I did not include a response to this part of your question:

 

At the end, however, I need a "normal" regression output table aggregated at the level of the whole dataset.

 

because I'm not sure what you mean.

--
Paige Miller
Rick_SAS
SAS Super FREQ

Follow PaigeMiller's advice, but also use the TABLEOUT OUTEST= option on the PROC REG statement:

proc reg data=have TABLEOUT outest=RegOut;

...

quit;

 

The dataset RegOut contains the parameter estimates, standard errors, p-values, and 95% CIs. Here is an example:

proc sort data=sashelp.class out=Have;
    by sex;
run;
proc reg data=have outest=RegOut TABLEOUT plots=none;
    by sex;
    model height = weight;
run;
quit;

proc print data=RegOut;
var Sex _TYPE_ Intercept Weight;
run;

/* to output only some statistics, use a WHERE clause */
proc print data=RegOut;
where _TYPE_="PARMS";
var Sex _TYPE_ Intercept Weight;
run;
Reeza
Super User

Try using a Task and it will generate the code. 

Place stick in the GROUP ANALYSIS BY section. 

 

https://documentation.sas.com/?activeCdc=webeditorcdc&cdcId=sasstudiocdc&cdcVersion=3.7&docsetId=web...

 

 

As to how to combine them into one output, what’s the math behind that?

 


@svw1900 wrote:

Hi everyone,

 

I am using SAS University Edition and have a question regarding a regression analysis, which is probably easy to solve but I am new to SAS and did not found a particular solution for this (probably because I had not a real clue of how to find this).

 

I have a dataset which looks like this:

 

Year      Stock_Identifier       Y_Var       X_Var1

2005              1                       0,3            0,1

2006              1                       0,4            0,2

2007              1                       0,5            0,15

2008              1                       0,6            0,25

2005              2                       0,3            0,3

2006              2                       0,3            0,4

2007              2                       0,5            0,4

2008              2                       0,4            0,5

 

What I need is a linear regression which tells me the correlation between Y and X1 (in the real dataset, I have some more X-Var but that should not be a problem). A normal linear regression would, I think, ignore the stock identifiers and just compare Y and X. That's where I need you. The regression does only make sense at the level of the stock, so in this case there should be one regression for stock 1 and its data points between 2005 and 2008 and the next one for the second stock. At the end, however, I need a "normal" regression output table aggregated at the level of the whole dataset.

 

I hope that this is clear and maybe it's a stupid question (sorry for that), but I am really thankful for your input (optimally you could even explain me the steps you take, because I am really new to SAS :-)).

 

Kind regards


 

PGStats
Opal | Level 21

What do you expect about the relation between Y and X1 for different stocks? Do you expect them to have a common slope but different intercepts? As in Y = B0s + B1*X1, where the B0s are a stock-specific intercepts. This kind of relationship can (and should) be fitted with a single regression.

PG
PaigeMiller
Diamond | Level 26

@PGStats wrote:

What do you expect about the relation between Y and X1 for different stocks? Do you expect them to have a common slope but different intercepts? As in Y = B0s + B1*X1, where the B0s are a stock-specific intercepts. This kind of relationship can (and should) be fitted with a single regression.


Excellent point!

--
Paige Miller
svw1900
Obsidian | Level 7
Ok Maybe I'm completely wrong with what I thought.

So what I Need is the correlation between y_var and all the x_variables in General. In a normal Regression, you would therefore compare all the data from all columns an receive the correlation, intercepts, parameters for x_variables etc. In my case, however, a normal regression at the level of the overall dataset would not make sense because the y_variable is Clustered in subgroups (the stocks), right? That's Why I thought I need multiple regressions, one for every Stock and its time-series.

Because of that I thought that the Output would be a set of Regression results, one for each subgroup. But that would not be what I Need at the end, since I need a general Information about the correlation between y and all the x's. Something like an aggregated correlation over all subgroups.

Maybe I am completely wrong in any of these points, in this Case Thanks for clarification.

Kind regards
PaigeMiller
Diamond | Level 26

@svw1900 wrote:
Ok Maybe I'm completely wrong with what I thought.

So what I Need is the correlation between y_var and all the x_variables in General. In a normal Regression, you would therefore compare all the data from all columns an receive the correlation, intercepts, parameters for x_variables etc. In my case, however, a normal regression at the level of the overall dataset would not make sense because the y_variable is Clustered in subgroups (the stocks), right? That's Why I thought I need multiple regressions, one for every Stock and its time-series.

Because of that I thought that the Output would be a set of Regression results, one for each subgroup. But that would not be what I Need at the end, since I need a general Information about the correlation between y and all the x's. Something like an aggregated correlation over all subgroups.

Maybe I am completely wrong in any of these points, in this Case Thanks for clarification.

Kind regards

Do you agree with @PGStats that you want different intercepts for each stock, but common slope??? Or do you want different slopes for each stock?

 

You also now are speaking clearly about multiple X variables, which was absent from your original example which contained only a single X variable. Which would then translate to different intercepts for each stock, but common slopes (plural) for each X. Is that what you want?

--
Paige Miller
svw1900
Obsidian | Level 7
Sorry, seems like I Needed some Moments to understand the point Here. Yeah, I only expect different intercepts so a single Regression will probably fit for this case, you're right. Sorry for the confusion!
PaigeMiller
Diamond | Level 26
proc glm data=have;
    class stock_identifier;
    /* add as many X variables as you need here */ 
    model y = stock_identifier x1 x2 x3 /noint;
run;
quit;


 

--
Paige Miller

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1982 views
  • 7 likes
  • 5 in conversation