Solved: How to make a moving average [not simple task]

PatrykSAS · Posted 10-27-2020 01:45 PM

Hi. I would like to calculate how many days on average pass between contact with the client in terms of snapshots. I will use the following example tables (The DIFF_DAYS column means how many days have passed from the previous date in the given id)

Table 1

ID DATE DIFF_DAYS

1 2019-05-17 .

1 2019-10-06 142

1 2020-01-15 101

1 2020-02-23 39

2 2019-05-07 .

2 2019-08-20 105

2 2020-03-15 208

2 2020-04-10 26

And snapshot table (snapshots for each ID can be different)

Table 2

ID SNAPSHOT_DATE

1 2019-12-31

1 2020-06-30

2 2020-03-31

2 2020-06-30

All I want to achieve is a table simiar to this (AVG_DIFF_DAYS is always the average from the first date to the last included in the snapshot date)

ID SNAPSHOT_DATE AVG_DIFF_DAYS

1 2019-12-31 71

1 2020-06-30 70,5

2 2020-03-31 104,33

2 2020-06-30 84,75

The task is not very easy. How can I solve it?

novinosrin · Posted 10-27-2020 02:07 PM

Hi @PatrykSAS Can i assume all ID's in table1 are present in table2 ? if yes, it's pretty straight forward-


data table1;
 input ID      DATE :yymmdd10.   DIFF_DAYS;
 format date yymmdd10.;
 cards;
1       2019-05-17         .

1       2019-10-06        142

1       2020-01-15        101

1       2020-02-23         39

2       2019-05-07         .

2       2019-08-20        105

2       2020-03-15        208

2       2020-04-10        26
;

data table2;
 input ID  SNAPSHOT_DATE :yymmdd10.;
 format snapshot_Date yymmdd10.;
 cards; 
1    2019-12-31
1    2020-06-30
2    2020-03-31
2    2020-06-30
;


data want;
 if _n_=1 then do;
  dcl hash H (multidata:'y') ;
   h.definekey  ("id") ;
   h.definedata ("date", "DIFF_DAYS") ;
   h.definedone () ;
 end;
 do _n_=h.clear() by 0 until(last.id);
  set table1;
  by id;
  h.add();
 end;
 do until(last.id);
  set table2;
  by id;
  call missing(_s,_n);
  do _n_=h.find() by 0 while(_n_=0);
   if date<=SNAPSHOT_DATE  then do;
	 _n=sum(_n,1);
	 _s=sum(_s,diff_days,0);
   end;
   _n_=h.find_next();
  end;
  AVG_DIFF_DAYS=divide(_s,_n);
  output;
 end;
 format AVG_DIFF_DAYS 8.2;
 keep id SNAPSHOT_DATE AVG_DIFF_DAYS;
run;

proc print noobs;run;

ID	SNAPSHOT_DATE	AVG_DIFF_DAYS
1	2019-12-31	71.00
1	2020-06-30	70.50
2	2020-03-31	104.33
2	2020-06-30	84.75

View solution in original post

novinosrin · Posted 10-27-2020 02:07 PM

Hi @PatrykSAS Can i assume all ID's in table1 are present in table2 ? if yes, it's pretty straight forward-


data table1;
 input ID      DATE :yymmdd10.   DIFF_DAYS;
 format date yymmdd10.;
 cards;
1       2019-05-17         .

1       2019-10-06        142

1       2020-01-15        101

1       2020-02-23         39

2       2019-05-07         .

2       2019-08-20        105

2       2020-03-15        208

2       2020-04-10        26
;

data table2;
 input ID  SNAPSHOT_DATE :yymmdd10.;
 format snapshot_Date yymmdd10.;
 cards; 
1    2019-12-31
1    2020-06-30
2    2020-03-31
2    2020-06-30
;


data want;
 if _n_=1 then do;
  dcl hash H (multidata:'y') ;
   h.definekey  ("id") ;
   h.definedata ("date", "DIFF_DAYS") ;
   h.definedone () ;
 end;
 do _n_=h.clear() by 0 until(last.id);
  set table1;
  by id;
  h.add();
 end;
 do until(last.id);
  set table2;
  by id;
  call missing(_s,_n);
  do _n_=h.find() by 0 while(_n_=0);
   if date<=SNAPSHOT_DATE  then do;
	 _n=sum(_n,1);
	 _s=sum(_s,diff_days,0);
   end;
   _n_=h.find_next();
  end;
  AVG_DIFF_DAYS=divide(_s,_n);
  output;
 end;
 format AVG_DIFF_DAYS 8.2;
 keep id SNAPSHOT_DATE AVG_DIFF_DAYS;
run;

proc print noobs;run;

ID	SNAPSHOT_DATE	AVG_DIFF_DAYS
1	2019-12-31	71.00
1	2020-06-30	70.50
2	2020-03-31	104.33
2	2020-06-30	84.75

novinosrin · Posted 10-27-2020 02:24 PM

@PatrykSAS Easy and boring SQL-

data table1;
 input ID      DATE :yymmdd10.   DIFF_DAYS;
 format date yymmdd10.;
 cards;
1       2019-05-17         .

1       2019-10-06        142

1       2020-01-15        101

1       2020-02-23         39

2       2019-05-07         .

2       2019-08-20        105

2       2020-03-15        208

2       2020-04-10        26
;

data table2;
 input ID  SNAPSHOT_DATE :yymmdd10.;
 format snapshot_Date yymmdd10.;
 cards; 
1    2019-12-31
1    2020-06-30
2    2020-03-31
2    2020-06-30
;

proc sql;
 create table want as
 select a.*, divide(sum(DIFF_DAYS),count(SNAPSHOT_DATE)) as AVG_DIFF_DAYS format=8.2
 from table2 a left join table1 b
 on a.id=b.id and b.date<=SNAPSHOT_DATE
 group by a.id,SNAPSHOT_DATE;
quit;

PatrykSAS · Posted 10-28-2020 06:50 AM

Nice and easy. Thank you a lot!

How to make a moving average [not simple task]

Re: How to make a moving average [not simple task]

Re: How to make a moving average [not simple task]

Re: How to make a moving average [not simple task]

Re: How to make a moving average [not simple task]

How to make a moving average [not simple task]

Re: How to make a moving average [not simple task]

Re: How to make a moving average [not simple task]

Re: How to make a moving average [not simple task]

Re: How to make a moving average [not simple task]

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!