BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
yanshuai
Quartz | Level 8

Hello,

 

I have a dataset like this.

data temp.database ;
input
YEAR ID SALE;
datalines;
2000 001 23
2001 001 25
2002 001 28
2003 001 30
2004 001 32
2005 001 40
2000 002 23
2001 002 25
2002 002 28
2003 002 30
2004 002 32
2005 002 40
;
run;

What I want is like this

data temp.database ;
input
YEAR ID SALE BETA;
datalines;
2000 001 23     /* here it is empty as there are no value for previous year */
2001 001 25 
2002 001 28 
2003 001 30 0.3 /*the value I just make up, but it should be the coefficient from regressing SALE in 2000,2001,2002,2003 on the YEAR value (2000,2001,2002,2003)*/
2004 001 32 0.4 /*Again, this BETA should be the coefficient from regressing SALE in 2001,2002,2003,2004 on YEAR value 2001,2002,2003,2004 */
2005 001 40 0.5 /*Again, this BETA should be the coefficient from regressing SALE in 2002,2003,2004,2005 on YEAR value 2002,2003,2004,2005 */
2000 002 23 
2001 002 25 
2002 002 28 
2003 002 30 0.2
2004 002 32 0.3
2005 002 40 0.4
;
run;

If it is in a regression model, it will be like this (use obs in 2003 as example)

[SALE in 2000                    [2000

SALE in 2001   =    BETA*  2001     +    residual

SALE in 2002                     2002

SALE in 2003]                    2003]

Hope this makes sense to you.

How can I capture this BETA and residual respectively?

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

This is how to achieve what you want with proc reg

 

data have;
input YEAR ID SALE;
datalines;
2000 001 23
2001 001 25
2002 001 28
2003 001 30
2004 001 32
2005 001 40
2000 002 23
2001 002 25
2002 002 28
2003 002 30
2004 002 32
2005 002 40
;

%let window=4;

data temp;
set have;
do y = 1 to &window.;
    grp = year + &window. - y;
    output;
    end;
run;

proc sql;
create table reg as
select *
from temp
group by id, grp
having count(*) = &window.
order by id, grp, y;
quit;

proc reg data=reg noprint outest=regEst;
by id grp;
model sale=y;
output out=regOut(where=(y=&window)) r=residual;
run;

proc sql;
create table want as
select 
    a.*,
    c.y as beta,
    b.residual
from
    have as a left join 
    regOut as b on a.id=b.id and a.year=b.grp left join
    regEst as c on a.id=c.id and a.year=c.grp
order by id, year;
select * from want;
quit;
                      YEAR        ID      SALE      beta  Residual
                  ------------------------------------------------
                      2000         1        23         .         .
                      2001         1        25         .         .
                      2002         1        28         .         .
                      2003         1        30       2.4      -0.1
                      2004         1        32       2.3      -0.2
                      2005         1        40       3.8       1.8
                      2000         2        23         .         .
                      2001         2        25         .         .
                      2002         2        28         .         .
                      2003         2        30       2.4      -0.1
                      2004         2        32       2.3      -0.2
                      2005         2        40       3.8       1.8
PG

View solution in original post

13 REPLIES 13
PGStats
Opal | Level 21

Why don't you regress on all years for a given ID?

PG
yanshuai
Quartz | Level 8

This is how the previous researchers did. Only care about the previous certain years. Sometimes it is previous 4 years, sometimes 5 years.

And there are more than 900 ID, so I cannot do this manually.

PGStats
Opal | Level 21

I assume that BETA is the regression slope? Where is the INTERCEPT?

PG
yanshuai
Quartz | Level 8

Yea, coefficient is the slope.

Constant value is not needed in this research. The regression result will have it, but I will not use it.

Reeza
Super User

@yanshuai wrote:

This is how the previous researchers did. Only care about the previous certain years. Sometimes it is previous 4 years, sometimes 5 years.

And there are more than 900 ID, so I cannot do this manually.


How do you know which to do when?

When do you do a 4 year window or 5 year window. 

 

There’s a data step solution on here from mkeintz a few weeks ago about a moving slope calculation. 

 

Do you have SAS / ETS license, if so PROC EXPAND may be useful. 

yanshuai
Quartz | Level 8

OK.

In this case, I will do 4 year window.

4 year or 5 year are usually based on previous studies.

No matter 4 or 5, I am just curious about how to acheive this.

PGStats
Opal | Level 21

This is how to achieve what you want with proc reg

 

data have;
input YEAR ID SALE;
datalines;
2000 001 23
2001 001 25
2002 001 28
2003 001 30
2004 001 32
2005 001 40
2000 002 23
2001 002 25
2002 002 28
2003 002 30
2004 002 32
2005 002 40
;

%let window=4;

data temp;
set have;
do y = 1 to &window.;
    grp = year + &window. - y;
    output;
    end;
run;

proc sql;
create table reg as
select *
from temp
group by id, grp
having count(*) = &window.
order by id, grp, y;
quit;

proc reg data=reg noprint outest=regEst;
by id grp;
model sale=y;
output out=regOut(where=(y=&window)) r=residual;
run;

proc sql;
create table want as
select 
    a.*,
    c.y as beta,
    b.residual
from
    have as a left join 
    regOut as b on a.id=b.id and a.year=b.grp left join
    regEst as c on a.id=c.id and a.year=c.grp
order by id, year;
select * from want;
quit;
                      YEAR        ID      SALE      beta  Residual
                  ------------------------------------------------
                      2000         1        23         .         .
                      2001         1        25         .         .
                      2002         1        28         .         .
                      2003         1        30       2.4      -0.1
                      2004         1        32       2.3      -0.2
                      2005         1        40       3.8       1.8
                      2000         2        23         .         .
                      2001         2        25         .         .
                      2002         2        28         .         .
                      2003         2        30       2.4      -0.1
                      2004         2        32       2.3      -0.2
                      2005         2        40       3.8       1.8
PG
yanshuai
Quartz | Level 8

Thank you so much!

An interesting thing found: sale=y here y is only a series of number like 1 2 3 4. So I tried replace y with YEAR, but the result BETA is the same. So I guess the 2001 2002 2003 2004 is the same with 1 2 3 4 in regression model.

proc reg data=reg noprint outest=regEst;
by id grp;
model sale=YEAR;
output out=regOut(where=(y=&window)) r=residual;
run;

 By the way, if I would like to capture the BETA in previous 5 years rather than 4 years, can I simply switch to this?

%window=5

 

PGStats
Opal | Level 21

@yanshuai wrote:

 By the way, if I would like to capture the BETA in previous 5 years rather than 4 years, can I simply switch to this?

%window=5

 


That's the purpose of creating the macro variable window. Don't forget the semicolon at the end.

PG
Reeza
Super User

I didn't do the residual portion, not sure how you'll do that without the intercept for starters or how you'd do it with multiple models. If you have 4 models you then have at least 4 residuals for each observations which would get cumbersome quickly. 

 

Screen Shot 2018-03-15 at 8.49.16 PM.png

yanshuai
Quartz | Level 8

Thank you Reeza.

I just want to run a simple OLS regression. So there might be only one model used I guess.

Thank you so much anyway. I will figure out your code.

Ksharp
Super User

It is very easy for IML code.

 

data database ;
input
YEAR ID SALE;
datalines;
2000 001 23
2001 001 25
2002 001 28
2003 001 30
2004 001 32
2005 001 40
2000 002 23
2001 002 25
2002 002 28
2003 002 30
2004 002 32
2005 002 40
;
run;
proc iml;
use database;
read all var{id};
level=unique(id);

do i=1 to ncol(level);
 read all var{id year sale} where(id=(level[i]));
 beta=j(nrow(year),1,.);
 k=0;
 do j=4 to nrow(year);
  k=k+1;
  _year=year[k:j]; 
  _sale=sale[k:j];
  xx=j(4,1)||_year;
  beta[j]=solve(xx`*xx,xx`*_sale)[2];
 end;
 temp=id||year||sale||beta;
 want=want//temp;
end;
close;
create want from want[c={id year sale beta}];
append from want;
close;
quit;
yanshuai
Quartz | Level 8

Thank you. This is cool. First time heard about IML.

But is this code trying to figure out the BETA manually? Because this code looks really fimiliar with the slope formula in Econometrics.

beta[j]=solve(xx`*xx,xx`*_sale)[2]

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1869 views
  • 7 likes
  • 4 in conversation