Solved: Re: how to multiple the top 5 data by ID

Ariel · Posted 10-15-2015 12:38 PM

Hello,

I want to multiple the top five return(rtn) by id, but I don't know how to do. I need to calculate each id's BHR=(1+ro)(1+r1)(1+r2)(1+r3)(1+r4)(1+r5)-1. The result would be 1101 BHR, 1102 BHR, and so on. Please give me some suggestions. Thank you!

id	rtn	dif_day
1101	r0	0
1101	r1	1
…	…	…
1101	r251	251
1102	r0	0
1102	r1	1
…		…
1102	r251	251
1103	r0	0
1103	r1	1
…		…
1103	r251	251
…

KachiM · Posted 10-16-2015 10:04 AM

We can take care of DIF_DAY and revise my old program. We just use a WHERE= condition to filter observations for DIF_DAY in

0 to 4 and do the program with the data provided.

data have;
input ide date yymmdd10. dif_DAY  rtn;
datalines;
1101 2009/8/26   0  -0.01729
1101 2009/8/27   1   0.001466
1101 2009/8/28   2   0.021962
1101 2009/8/31   3   0.022923
1101 2009/9/01   4   0.001401
1101 2009/9/02   5  -0.00699
1101 2009/9/03   6   0
1101 2009/9/04   7  -0.00282
1101 2009/9/07   8   0.011299
1101 2012/11/09  0  -0.00921
1101 2012/11/12  1   0
1101 2012/11/13  2  -0.01859
1101 2012/11/14  3  -0.00406
1101 2012/11/15  4  -0.01223
1101 2012/11/16  5   0.001376
1101 2012/11/19  6   0.005495
1101 2012/11/20  7   0.005464
1101 2012/11/21  8   0
1103 2014/1/23   0  -0.00643
1103 2014/1/24   1   0.006472
1103 2014/1/27   2  -0.00322
1103 2014/2/05   3  -0.00323
1103 2014/2/06   4   0.003236
1103 2014/2/07   5  -0.00323
1103 2014/2/10   6   0.003236
1103 2014/2/11   7   0.003226
;
run;

proc sort data = have;
by ide ;
run;

data want;
   do until(last.ide);
      set have (where = (dif_DAY between 0 and 4));
      by ide;
      if dif_DAY = 0 then BHR = 1;
      BHR = BHR * (1 + rtn);
      if dif_DAY = 4 then do; BHR = BHR - 1; output; end;
   end;
drop dif_DAY rtn;
run;

proc print data = want ;
format date date10.;
run;

View solution in original post

michelle_delaurentis_sas_com · Posted 10-15-2015 03:16 PM

You will need to use by group processing to accomplish this. First sort the data by your id and dif_day variable then in a data step you can set the data set with a by id dif_day statement. This creates first.dif_day, first.id, last.dif_day and last.id whose values are 1 or 0 depending on the observation being processed. if the observation is the first of a particular value of the by variable the first. will be 1 otherwise it is 0. So you can use these variables to count which observation in the by group you are processing. Here is a simple example you can cut and paste:

data test;
do id=1101 to 1103;
do dif_day=0 to 20;
rtn=int(ranuni(0)*10);
output;
end;
end;
run;

proc sort data=test;
by id dif_day;
run;

data final;
set test;
by id;
array r(5) (5*0);
if first.id then do;
counter=0;
do i=1 to dim(r);
r(i)=0;
end;
end;
counter+1;
if counter le 5 then r(counter)=rtn;
if counter eq 5 then do;
total=1;
do i=1 to dim(r);
total=total * (1+r(i));
end;
total=total-1;
output;
end;
run;

KachiM · Posted 10-15-2015 03:43 PM

I do not see how dif_day is useful for your purpose. Here is an example to paly with.

First create a data set HAVE with r taking values in the range 0 to 1. Then sort it by ID. For every ID, use the first FIVE observations.

data have;
do id = 1101 to 1120;
   do i = 1 to 10;
      r = ranuni(123);
      output;
   end;
end;
drop i;
run;

proc sort data = have;
by id;
run;

data want;
   do i = 1 by 1 until(last.id);
      set have;
      by id;
      if first.id then BHR = 1;
      BHR = BHR * (1 + r);
      if i = 5 then do; BHR = BHR - 1; output; end;
      else continue;
   end;
drop i r;
run;

The OUTPUT:

  id       BHR

 1101     6.0477
 1102     9.8697
 1103     9.0726
 1104     5.2849
 1105     9.2560
 1106    11.5767
 1107     4.8835
 1108     5.1041
 1109     3.7164
 1110     9.5704
 1111    17.6294
 1112     4.7647
 1113    14.5187
 1114     9.6550
 1115     8.1160
 1116     6.6383
 1117     5.8073
 1118     8.0927
 1119     8.0730
 1120     4.0700

Ariel · Posted 10-16-2015 05:53 AM

Hi Datasp,

I explained why I set dif_Day first. Take 1101(id) in my attached file as an example. This firm has two deals on different date, 2009/8/26 and 2012/11/9. And the dif_Day means the different between 2009/8/26 and 2009/8/27. However, your solution may only show 1101 in one time....Besides, I have a question about r=ranuni(123). I already have the real return(rtn), I think I should import my real dataset first?

I'm sorry that I am poor at sas...Please give me more suggestion, thank you.

I want to mutiple the rtn of dif_Day'0' to dif_Day'4' by each ide. Besides, some ide may have more than one start date and they cannot be deleted.

1101 BHR=(1+-0.01729)*(1+0.001466)*(1+0.021962)*(1+0.022923)*(1+0.001401)-1

1101 BHR=(1+-0.00921)*(1+0)*(1+-0.01859)*(1+0.-0.00406)*(1+0.-0.01223)-1

1103 BHR=

ide	date	dif_DAY	rtn
1101	2009/8/26	0	-0.01729
1101	2009/8/27	1	0.001466
1101	2009/8/28	2	0.021962
1101	2009/8/31	3	0.022923
1101	2009/9/1	4	0.001401
1101	2009/9/2	5	-0.00699
1101	2009/9/3	6	0
1101	2009/9/4	7	-0.00282
1101	2009/9/7	8	0.011299
…
1101	2012/11/9	0	-0.00921
1101	2012/11/12	1	0
1101	2012/11/13	2	-0.01859
1101	2012/11/14	3	-0.00406
1101	2012/11/15	4	-0.01223
1101	2012/11/16	5	0.001376
1101	2012/11/19	6	0.005495
1101	2012/11/20	7	0.005464
1101	2012/11/21	8	0
1103	2014/1/23	0	-0.00643
1103	2014/1/24	1	0.006472
1103	2014/1/27	2	-0.00322
1103	2014/2/5	3	-0.00323
1103	2014/2/6	4	0.003236
1103	2014/2/7	5	-0.00323
1103	2014/2/10	6	0.003236
1103	2014/2/11	7	0.003226

KachiM · Posted 10-16-2015 10:04 AM

We can take care of DIF_DAY and revise my old program. We just use a WHERE= condition to filter observations for DIF_DAY in

0 to 4 and do the program with the data provided.

data have;
input ide date yymmdd10. dif_DAY  rtn;
datalines;
1101 2009/8/26   0  -0.01729
1101 2009/8/27   1   0.001466
1101 2009/8/28   2   0.021962
1101 2009/8/31   3   0.022923
1101 2009/9/01   4   0.001401
1101 2009/9/02   5  -0.00699
1101 2009/9/03   6   0
1101 2009/9/04   7  -0.00282
1101 2009/9/07   8   0.011299
1101 2012/11/09  0  -0.00921
1101 2012/11/12  1   0
1101 2012/11/13  2  -0.01859
1101 2012/11/14  3  -0.00406
1101 2012/11/15  4  -0.01223
1101 2012/11/16  5   0.001376
1101 2012/11/19  6   0.005495
1101 2012/11/20  7   0.005464
1101 2012/11/21  8   0
1103 2014/1/23   0  -0.00643
1103 2014/1/24   1   0.006472
1103 2014/1/27   2  -0.00322
1103 2014/2/05   3  -0.00323
1103 2014/2/06   4   0.003236
1103 2014/2/07   5  -0.00323
1103 2014/2/10   6   0.003236
1103 2014/2/11   7   0.003226
;
run;

proc sort data = have;
by ide ;
run;

data want;
   do until(last.ide);
      set have (where = (dif_DAY between 0 and 4));
      by ide;
      if dif_DAY = 0 then BHR = 1;
      BHR = BHR * (1 + rtn);
      if dif_DAY = 4 then do; BHR = BHR - 1; output; end;
   end;
drop dif_DAY rtn;
run;

proc print data = want ;
format date date10.;
run;

Ariel · Posted 10-26-2015 08:50 PM

Thanks for your help! It took me a long time to solve, and it worked!

MikeZdeb · Posted 10-16-2015 03:31 PM

Hi ... using your posted data ...

libname zz 'z:\rtn.xls';

proc sort data=zz.'sheet1$'n out=zztop;
by ide date;
run;

data want (keep=ide bhr date);
bhr = 1;
do i = 1 by 1 until (last.ide or i eq 5);
   set zztop;
   by ide;
   where dif_day le 4;
   bhr = bhr * (1+rtn);
end;
bhr +(- 1);
run;

libname zz clear;

first 3 observations data set WANT ...
Obs ide date bhr

1    1101    01SEP2009     0.030260
2    1101    15NOV2012    -0.043421
3    1103    06FEB2014    -0.003216

Ariel · Posted 10-26-2015 08:51 PM

Thank you :):):)

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away