BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kashlik123
Obsidian | Level 7
 

I am trying to calculate some statistics for a given variable based on the client's id and the time horizon. My current solution is show below, however, I would like to know if there is a way to reformat the code into a datastep instead of an sql join, because the join requires an excessive amount of time to execute on my real dataset.

data have1(drop=t);
id = 1;
dt = '31dec2020'd;
do t=1 to 10;
    dt = dt + 1;
    var = rand('uniform');
    output;
end;
format dt ddmmyyp10.;
run;

data have2(drop=t);
id = 2;
dt = '31dec2020'd;
do t=1 to 10;
    dt = dt + 1;
    var = rand('uniform');
    output;
end;
format dt ddmmyyp10.;
run;

data have_fin;
set have1 have2;
run;

Proc sql;
create table want1 as
select a.id, a.dt,a.var, mean(b.var) as mean_var_3d
from have_fin as a
left join have_fin as b
    on a.id = b.id and intnx('day',a.dt,-3,'S') < b.dt <= a.dt
group by 1,2,3;
Quit;

Proc sql;
create table want2 as
select a.id, a.dt,a.var, mean(b.var) as mean_var_3d
from have_fin as a
left join have_fin as b
    on a.id = b.id and intnx('day',a.dt,-6,'S') < b.dt <= a.dt
group by 1,2,3;
Quit;

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @kashlik123,

 

The date sequences in your sample data have no gaps (of >=2 days) and no duplicates within an ID. My suggested DATA step below only assumes the latter. If there are also no gaps in your real data, the code can be simplified. (Maybe you could handle duplicate dates, if any, in a preliminary PROC SUMMARY step aggregating var values by ID dt. But this depends on how you'd define the means in the presence of duplicate dates.)

data want(drop=_:);
array _3d[0:2] _temporary_;
array _6d[0:5] _temporary_;
set have_fin;
by id dt;
_d=dif(dt);
if first.id | _d>=6 then call missing(of _3d[*], of _6d[*], _i3, _i6);
else do;
  do _j=1 to _d-1;
    _i6+1;
    _6d[mod(_i6,6)]=.;
  end;
  if _d>=3 then call missing(of _3d[*], _i3);
  else do _j=1 to _d-1;
    _i3+1;
    _3d[mod(_i3,3)]=.;
  end;
end;
_i3+1;
_i6+1;
_3d[mod(_i3,3)]=var;
_6d[mod(_i6,6)]=var;
mean_var_3d=mean(of _3d[*]);
mean_var_6d=mean(of _6d[*]);
run;

View solution in original post

2 REPLIES 2
ballardw
Super User

Is this supposed to be some sort of "rolling" mean where the number of records in the denominator keeps changing?

If so then you might look into Proc Expand if you have SAS/ETS licensed.

If not describe what this is a supposed to be doing.

 

BTW, you can really simplify your "example" data by creating one data set with another do loop:

data have1(drop=t);
   do id = 1,2;/* list explicit values */
      /* or use Do id=1 to 2; or more if desired*/
      dt = '31dec2020'd;
      do t=1 to 10;
          dt = dt + 1;
          var = rand('uniform');
          output;
      end;
   end;
   format dt ddmmyyp10.;
run;
FreelanceReinh
Jade | Level 19

Hello @kashlik123,

 

The date sequences in your sample data have no gaps (of >=2 days) and no duplicates within an ID. My suggested DATA step below only assumes the latter. If there are also no gaps in your real data, the code can be simplified. (Maybe you could handle duplicate dates, if any, in a preliminary PROC SUMMARY step aggregating var values by ID dt. But this depends on how you'd define the means in the presence of duplicate dates.)

data want(drop=_:);
array _3d[0:2] _temporary_;
array _6d[0:5] _temporary_;
set have_fin;
by id dt;
_d=dif(dt);
if first.id | _d>=6 then call missing(of _3d[*], of _6d[*], _i3, _i6);
else do;
  do _j=1 to _d-1;
    _i6+1;
    _6d[mod(_i6,6)]=.;
  end;
  if _d>=3 then call missing(of _3d[*], _i3);
  else do _j=1 to _d-1;
    _i3+1;
    _3d[mod(_i3,3)]=.;
  end;
end;
_i3+1;
_i6+1;
_3d[mod(_i3,3)]=var;
_6d[mod(_i6,6)]=var;
mean_var_3d=mean(of _3d[*]);
mean_var_6d=mean(of _6d[*]);
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 534 views
  • 0 likes
  • 3 in conversation