Hi. I would like to calculate how many days on average pass between contact with the client in terms of snapshots. I will use the following example tables (The DIFF_DAYS column means how many days have passed from the previous date in the given id)
Table 1
ID DATE DIFF_DAYS
1 2019-05-17 .
1 2019-10-06 142
1 2020-01-15 101
1 2020-02-23 39
2 2019-05-07 .
2 2019-08-20 105
2 2020-03-15 208
2 2020-04-10 26
And snapshot table (snapshots for each ID can be different)
Table 2
ID SNAPSHOT_DATE
1 2019-12-31
1 2020-06-30
2 2020-03-31
2 2020-06-30
All I want to achieve is a table simiar to this (AVG_DIFF_DAYS is always the average from the first date to the last included in the snapshot date)
ID SNAPSHOT_DATE AVG_DIFF_DAYS
1 2019-12-31 71
1 2020-06-30 70,5
2 2020-03-31 104,33
2 2020-06-30 84,75
The task is not very easy. How can I solve it?
Hi @PatrykSAS Can i assume all ID's in table1 are present in table2 ? if yes, it's pretty straight forward-
data table1;
input ID DATE :yymmdd10. DIFF_DAYS;
format date yymmdd10.;
cards;
1 2019-05-17 .
1 2019-10-06 142
1 2020-01-15 101
1 2020-02-23 39
2 2019-05-07 .
2 2019-08-20 105
2 2020-03-15 208
2 2020-04-10 26
;
data table2;
input ID SNAPSHOT_DATE :yymmdd10.;
format snapshot_Date yymmdd10.;
cards;
1 2019-12-31
1 2020-06-30
2 2020-03-31
2 2020-06-30
;
data want;
if _n_=1 then do;
dcl hash H (multidata:'y') ;
h.definekey ("id") ;
h.definedata ("date", "DIFF_DAYS") ;
h.definedone () ;
end;
do _n_=h.clear() by 0 until(last.id);
set table1;
by id;
h.add();
end;
do until(last.id);
set table2;
by id;
call missing(_s,_n);
do _n_=h.find() by 0 while(_n_=0);
if date<=SNAPSHOT_DATE then do;
_n=sum(_n,1);
_s=sum(_s,diff_days,0);
end;
_n_=h.find_next();
end;
AVG_DIFF_DAYS=divide(_s,_n);
output;
end;
format AVG_DIFF_DAYS 8.2;
keep id SNAPSHOT_DATE AVG_DIFF_DAYS;
run;
proc print noobs;run;
ID | SNAPSHOT_DATE | AVG_DIFF_DAYS |
---|---|---|
1 | 2019-12-31 | 71.00 |
1 | 2020-06-30 | 70.50 |
2 | 2020-03-31 | 104.33 |
2 | 2020-06-30 | 84.75 |
Hi @PatrykSAS Can i assume all ID's in table1 are present in table2 ? if yes, it's pretty straight forward-
data table1;
input ID DATE :yymmdd10. DIFF_DAYS;
format date yymmdd10.;
cards;
1 2019-05-17 .
1 2019-10-06 142
1 2020-01-15 101
1 2020-02-23 39
2 2019-05-07 .
2 2019-08-20 105
2 2020-03-15 208
2 2020-04-10 26
;
data table2;
input ID SNAPSHOT_DATE :yymmdd10.;
format snapshot_Date yymmdd10.;
cards;
1 2019-12-31
1 2020-06-30
2 2020-03-31
2 2020-06-30
;
data want;
if _n_=1 then do;
dcl hash H (multidata:'y') ;
h.definekey ("id") ;
h.definedata ("date", "DIFF_DAYS") ;
h.definedone () ;
end;
do _n_=h.clear() by 0 until(last.id);
set table1;
by id;
h.add();
end;
do until(last.id);
set table2;
by id;
call missing(_s,_n);
do _n_=h.find() by 0 while(_n_=0);
if date<=SNAPSHOT_DATE then do;
_n=sum(_n,1);
_s=sum(_s,diff_days,0);
end;
_n_=h.find_next();
end;
AVG_DIFF_DAYS=divide(_s,_n);
output;
end;
format AVG_DIFF_DAYS 8.2;
keep id SNAPSHOT_DATE AVG_DIFF_DAYS;
run;
proc print noobs;run;
ID | SNAPSHOT_DATE | AVG_DIFF_DAYS |
---|---|---|
1 | 2019-12-31 | 71.00 |
1 | 2020-06-30 | 70.50 |
2 | 2020-03-31 | 104.33 |
2 | 2020-06-30 | 84.75 |
@PatrykSAS Easy and boring SQL-
data table1;
input ID DATE :yymmdd10. DIFF_DAYS;
format date yymmdd10.;
cards;
1 2019-05-17 .
1 2019-10-06 142
1 2020-01-15 101
1 2020-02-23 39
2 2019-05-07 .
2 2019-08-20 105
2 2020-03-15 208
2 2020-04-10 26
;
data table2;
input ID SNAPSHOT_DATE :yymmdd10.;
format snapshot_Date yymmdd10.;
cards;
1 2019-12-31
1 2020-06-30
2 2020-03-31
2 2020-06-30
;
proc sql;
create table want as
select a.*, divide(sum(DIFF_DAYS),count(SNAPSHOT_DATE)) as AVG_DIFF_DAYS format=8.2
from table2 a left join table1 b
on a.id=b.id and b.date<=SNAPSHOT_DATE
group by a.id,SNAPSHOT_DATE;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.