Help using Base SAS procedures

How to run a rolling regression with two datasets?

Reply
Occasional Contributor
Posts: 12

How to run a rolling regression with two datasets?

Say with data set A.

 

DATE    ID  

10          1

10          2

10          3

20          1

20          2

20          3

30          1

30          2

30          3

 

 

And data set B,

 

DATE    ID  x   y

1 1 0.070933926 0.323055974
2 1 0.406570761 0.13760953
3 1 0.883535963 0.578733088
4 1 0.016231788 0.729134018
5 1 0.668424989 0.761197387
6 1 0.242259712 0.810040071
7 1 0.213543345 0.058647549
8 1 0.857330638 0.531532208
9 1 0.501064997 0.743060726
10 1 0.018354547 0.072328341
11 1 0.99627551 0.140234082
12 1 0.267813864 0.134323054
13 1 0.132534742 0.716864341
14 1 0.107630158 0.498794317
15 1 0.814832803 0.678847109
16 1 0.194916825 0.652946461
17 1 0.24655657 0.348632146
18 1 0.581696172 0.541978606
19 1 0.448507082 0.723949992
20 1 0.725078176 0.336086311
21 1 0.572441245 0.70316834
22 1 0.638282508 0.529384198
23 1 0.967807781 0.159573909
24 1 0.962864907 0.683835901
25 1 0.772681849 0.53964572
26 1 0.67532915 0.484330887
27 1 0.946038446 0.569370775
28 1 0.734410321 0.801076677
29 1 0.018660865 0.943685516
30 1 0.102126634 0.12693274

1 2 0.337826696 0.381279823
2 2 0.402967304 0.784483754
3 2 0.528027957 0.132032813
4 2 0.679449095 0.706947401
5 2 0.083379513 0.431491854
6 2 0.086233166 0.237205938
7 2 0.247418031 0.816204954
8 2 0.877926644 0.738031694
9 2 0.234003429 0.900287427
10 2 0.878105682 0.430906847
11 2 0.159710129 0.647456551
12 2 0.245602139 0.007026074
13 2 0.596617096 0.49183371
14 2 0.858081967 0.998086222
15 2 0.946591278 0.508973381
16 2 0.030718654 0.09659358
17 2 0.383475095 0.001080811
18 2 0.47175129 0.88797923
19 2 0.231755643 0.706321812
20 2 0.26818677 0.54529103
21 2 0.605350394 0.891914056
22 2 0.278178314 0.14754076
23 2 0.179958498 0.622006208
24 2 0.536177448 0.921842128
25 2 0.881522327 0.772927816
26 2 0.034479121 0.836198033
27 2 0.61827926 0.119682909
28 2 0.363028818 0.536296862
29 2 0.416057939 0.209719423
30 2 0.319097646 0.971274071

7 3 0.540217872 0.445921819
8 3 0.086111398 0.304460289
9 3 0.193343887 0.452840456
10 3 0.075337725 0.387945463

 

 

 

 

Baseically, I want to write a loop, what read one row of data A, and then find the 10 observations from data B with

A.date-10<B.date<=A.date   and   B.ID=A.ID,

and then get the regression coefficient with this regression and then put it back to data set A.

 

I have found some codes which can achieve this , but they are too costly, since they just collapse two data sets into one and then reg. However, my dataset is so big that it is impossible to do like that.

 

I am wondering if it will be possible to write a loop that go through all observations in data A, and for each observation of data A, find proper data from data B, and get regression results and put it back to data A, or just create a new data to store these results first.

 

 

 

Could someone help a rookie like me to write a program like this? Smiley Happy  many kisses...

Super User
Posts: 9,676

Re: How to run a rolling regression with two datasets?

You can use SQL to get those obs firstly then feed it into PROC REG.

proc sql;

create table want as

select A.ID as group1, A.DATE as group2, B.*

 from A,B

  where A.date-10<B.date<=A.date   and   B.ID=A.ID ;

quit;

 

proc reg data=want;

 by group1 group2;

 model y=x ;

quit;

Occasional Contributor
Posts: 12

Re: How to run a rolling regression with two datasets?

Thank you, Keshan. But the problem here is that the data is so big, so creating this 'want' data set is too costly.

 

What I get in mind is that, is it possible to everytime use one observation from A and then according to it, find proper data from B , and feed regression results back to A.

 

sth like this : https://communities.sas.com/t5/Base-SAS-Programming/Can-I-call-execute-macro-within-a-DO-loop/m-p/24...

Super User
Posts: 9,676

Re: How to run a rolling regression with two datasets?

I know what you are talking about. It is about CALL EXECUTE(). But I would doubt the efficiency if table is big.

 

 

data A;
input DATE    ID  ;
cards;
10          1
10          2
10          3
20          1
20          2
20          3
30          1
30          2
30          3
;
run;
 
 
data B;
input DATE    ID  x   y;
cards;
1 1 0.070933926 0.323055974
2 1 0.406570761 0.13760953
3 1 0.883535963 0.578733088
4 1 0.016231788 0.729134018
5 1 0.668424989 0.761197387
6 1 0.242259712 0.810040071
7 1 0.213543345 0.058647549
8 1 0.857330638 0.531532208
9 1 0.501064997 0.743060726
10 1 0.018354547 0.072328341
11 1 0.99627551 0.140234082
12 1 0.267813864 0.134323054
13 1 0.132534742 0.716864341
14 1 0.107630158 0.498794317
15 1 0.814832803 0.678847109
16 1 0.194916825 0.652946461
17 1 0.24655657 0.348632146
18 1 0.581696172 0.541978606
19 1 0.448507082 0.723949992
20 1 0.725078176 0.336086311
21 1 0.572441245 0.70316834
22 1 0.638282508 0.529384198
23 1 0.967807781 0.159573909
24 1 0.962864907 0.683835901
25 1 0.772681849 0.53964572
26 1 0.67532915 0.484330887
27 1 0.946038446 0.569370775
28 1 0.734410321 0.801076677
29 1 0.018660865 0.943685516
30 1 0.102126634 0.12693274
1 2 0.337826696 0.381279823
2 2 0.402967304 0.784483754
3 2 0.528027957 0.132032813
4 2 0.679449095 0.706947401
5 2 0.083379513 0.431491854
6 2 0.086233166 0.237205938
7 2 0.247418031 0.816204954
8 2 0.877926644 0.738031694
9 2 0.234003429 0.900287427
10 2 0.878105682 0.430906847
11 2 0.159710129 0.647456551
12 2 0.245602139 0.007026074
13 2 0.596617096 0.49183371
14 2 0.858081967 0.998086222
15 2 0.946591278 0.508973381
16 2 0.030718654 0.09659358
17 2 0.383475095 0.001080811
18 2 0.47175129 0.88797923
19 2 0.231755643 0.706321812
20 2 0.26818677 0.54529103
21 2 0.605350394 0.891914056
22 2 0.278178314 0.14754076
23 2 0.179958498 0.622006208
24 2 0.536177448 0.921842128
25 2 0.881522327 0.772927816
26 2 0.034479121 0.836198033
27 2 0.61827926 0.119682909
28 2 0.363028818 0.536296862
29 2 0.416057939 0.209719423
30 2 0.319097646 0.971274071
7 3 0.540217872 0.445921819
8 3 0.086111398 0.304460289
9 3 0.193343887 0.452840456
10 3 0.075337725 0.387945463
;
run;
data _null_;
 set A end=last;
call execute(catt('proc reg data=b(where=(',date-10,'<date<=',date,' and ID=',ID,'));model y=x;quit;'));
run;


 
Ask a Question
Discussion stats
  • 3 replies
  • 352 views
  • 1 like
  • 2 in conversation