BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kashlik123
Obsidian | Level 7
 

I am trying to calculate some statistics for a given variable based on the client's id and the time horizon. My current solution is show below, however, I would like to know if there is a way to reformat the code into a datastep instead of an sql join, because the join requires an excessive amount of time to execute on my real dataset.

data have1(drop=t);
id = 1;
dt = '31dec2020'd;
do t=1 to 10;
    dt = dt + 1;
    var = rand('uniform');
    output;
end;
format dt ddmmyyp10.;
run;

data have2(drop=t);
id = 2;
dt = '31dec2020'd;
do t=1 to 10;
    dt = dt + 1;
    var = rand('uniform');
    output;
end;
format dt ddmmyyp10.;
run;

data have_fin;
set have1 have2;
run;

Proc sql;
create table want1 as
select a.id, a.dt,a.var, mean(b.var) as mean_var_3d
from have_fin as a
left join have_fin as b
    on a.id = b.id and intnx('day',a.dt,-3,'S') < b.dt <= a.dt
group by 1,2,3;
Quit;

Proc sql;
create table want2 as
select a.id, a.dt,a.var, mean(b.var) as mean_var_3d
from have_fin as a
left join have_fin as b
    on a.id = b.id and intnx('day',a.dt,-6,'S') < b.dt <= a.dt
group by 1,2,3;
Quit;

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @kashlik123,

 

The date sequences in your sample data have no gaps (of >=2 days) and no duplicates within an ID. My suggested DATA step below only assumes the latter. If there are also no gaps in your real data, the code can be simplified. (Maybe you could handle duplicate dates, if any, in a preliminary PROC SUMMARY step aggregating var values by ID dt. But this depends on how you'd define the means in the presence of duplicate dates.)

data want(drop=_:);
array _3d[0:2] _temporary_;
array _6d[0:5] _temporary_;
set have_fin;
by id dt;
_d=dif(dt);
if first.id | _d>=6 then call missing(of _3d[*], of _6d[*], _i3, _i6);
else do;
  do _j=1 to _d-1;
    _i6+1;
    _6d[mod(_i6,6)]=.;
  end;
  if _d>=3 then call missing(of _3d[*], _i3);
  else do _j=1 to _d-1;
    _i3+1;
    _3d[mod(_i3,3)]=.;
  end;
end;
_i3+1;
_i6+1;
_3d[mod(_i3,3)]=var;
_6d[mod(_i6,6)]=var;
mean_var_3d=mean(of _3d[*]);
mean_var_6d=mean(of _6d[*]);
run;

View solution in original post

2 REPLIES 2
ballardw
Super User

Is this supposed to be some sort of "rolling" mean where the number of records in the denominator keeps changing?

If so then you might look into Proc Expand if you have SAS/ETS licensed.

If not describe what this is a supposed to be doing.

 

BTW, you can really simplify your "example" data by creating one data set with another do loop:

data have1(drop=t);
   do id = 1,2;/* list explicit values */
      /* or use Do id=1 to 2; or more if desired*/
      dt = '31dec2020'd;
      do t=1 to 10;
          dt = dt + 1;
          var = rand('uniform');
          output;
      end;
   end;
   format dt ddmmyyp10.;
run;
FreelanceReinh
Jade | Level 19

Hello @kashlik123,

 

The date sequences in your sample data have no gaps (of >=2 days) and no duplicates within an ID. My suggested DATA step below only assumes the latter. If there are also no gaps in your real data, the code can be simplified. (Maybe you could handle duplicate dates, if any, in a preliminary PROC SUMMARY step aggregating var values by ID dt. But this depends on how you'd define the means in the presence of duplicate dates.)

data want(drop=_:);
array _3d[0:2] _temporary_;
array _6d[0:5] _temporary_;
set have_fin;
by id dt;
_d=dif(dt);
if first.id | _d>=6 then call missing(of _3d[*], of _6d[*], _i3, _i6);
else do;
  do _j=1 to _d-1;
    _i6+1;
    _6d[mod(_i6,6)]=.;
  end;
  if _d>=3 then call missing(of _3d[*], _i3);
  else do _j=1 to _d-1;
    _i3+1;
    _3d[mod(_i3,3)]=.;
  end;
end;
_i3+1;
_i6+1;
_3d[mod(_i3,3)]=var;
_6d[mod(_i6,6)]=var;
mean_var_3d=mean(of _3d[*]);
mean_var_6d=mean(of _6d[*]);
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 516 views
  • 0 likes
  • 3 in conversation