Hi. I would like to calculate how many days on average pass between contact with the client in terms of snapshots. I will use the following example tables (The DIFF_DAYS column means how many days have passed from the previous date in the given id)
Table 1
ID DATE DIFF_DAYS
1 2019-05-17 .
1 2019-10-06 142
1 2020-01-15 101
1 2020-02-23 39
2 2019-05-07 .
2 2019-08-20 105
2 2020-03-15 208
2 2020-04-10 26
And snapshot table (snapshots for each ID can be different)
Table 2
ID SNAPSHOT_DATE
1 2019-12-31
1 2020-06-30
2 2020-03-31
2 2020-06-30
All I want to achieve is a table simiar to this (AVG_DIFF_DAYS is always the average from the first date to the last included in the snapshot date)
ID SNAPSHOT_DATE AVG_DIFF_DAYS
1 2019-12-31 71
1 2020-06-30 70,5
2 2020-03-31 104,33
2 2020-06-30 84,75
The task is not very easy. How can I solve it?
Hi @PatrykSAS Can i assume all ID's in table1 are present in table2 ? if yes, it's pretty straight forward-
data table1;
input ID DATE :yymmdd10. DIFF_DAYS;
format date yymmdd10.;
cards;
1 2019-05-17 .
1 2019-10-06 142
1 2020-01-15 101
1 2020-02-23 39
2 2019-05-07 .
2 2019-08-20 105
2 2020-03-15 208
2 2020-04-10 26
;
data table2;
input ID SNAPSHOT_DATE :yymmdd10.;
format snapshot_Date yymmdd10.;
cards;
1 2019-12-31
1 2020-06-30
2 2020-03-31
2 2020-06-30
;
data want;
if _n_=1 then do;
dcl hash H (multidata:'y') ;
h.definekey ("id") ;
h.definedata ("date", "DIFF_DAYS") ;
h.definedone () ;
end;
do _n_=h.clear() by 0 until(last.id);
set table1;
by id;
h.add();
end;
do until(last.id);
set table2;
by id;
call missing(_s,_n);
do _n_=h.find() by 0 while(_n_=0);
if date<=SNAPSHOT_DATE then do;
_n=sum(_n,1);
_s=sum(_s,diff_days,0);
end;
_n_=h.find_next();
end;
AVG_DIFF_DAYS=divide(_s,_n);
output;
end;
format AVG_DIFF_DAYS 8.2;
keep id SNAPSHOT_DATE AVG_DIFF_DAYS;
run;
proc print noobs;run;
ID | SNAPSHOT_DATE | AVG_DIFF_DAYS |
---|---|---|
1 | 2019-12-31 | 71.00 |
1 | 2020-06-30 | 70.50 |
2 | 2020-03-31 | 104.33 |
2 | 2020-06-30 | 84.75 |
Hi @PatrykSAS Can i assume all ID's in table1 are present in table2 ? if yes, it's pretty straight forward-
data table1;
input ID DATE :yymmdd10. DIFF_DAYS;
format date yymmdd10.;
cards;
1 2019-05-17 .
1 2019-10-06 142
1 2020-01-15 101
1 2020-02-23 39
2 2019-05-07 .
2 2019-08-20 105
2 2020-03-15 208
2 2020-04-10 26
;
data table2;
input ID SNAPSHOT_DATE :yymmdd10.;
format snapshot_Date yymmdd10.;
cards;
1 2019-12-31
1 2020-06-30
2 2020-03-31
2 2020-06-30
;
data want;
if _n_=1 then do;
dcl hash H (multidata:'y') ;
h.definekey ("id") ;
h.definedata ("date", "DIFF_DAYS") ;
h.definedone () ;
end;
do _n_=h.clear() by 0 until(last.id);
set table1;
by id;
h.add();
end;
do until(last.id);
set table2;
by id;
call missing(_s,_n);
do _n_=h.find() by 0 while(_n_=0);
if date<=SNAPSHOT_DATE then do;
_n=sum(_n,1);
_s=sum(_s,diff_days,0);
end;
_n_=h.find_next();
end;
AVG_DIFF_DAYS=divide(_s,_n);
output;
end;
format AVG_DIFF_DAYS 8.2;
keep id SNAPSHOT_DATE AVG_DIFF_DAYS;
run;
proc print noobs;run;
ID | SNAPSHOT_DATE | AVG_DIFF_DAYS |
---|---|---|
1 | 2019-12-31 | 71.00 |
1 | 2020-06-30 | 70.50 |
2 | 2020-03-31 | 104.33 |
2 | 2020-06-30 | 84.75 |
@PatrykSAS Easy and boring SQL-
data table1;
input ID DATE :yymmdd10. DIFF_DAYS;
format date yymmdd10.;
cards;
1 2019-05-17 .
1 2019-10-06 142
1 2020-01-15 101
1 2020-02-23 39
2 2019-05-07 .
2 2019-08-20 105
2 2020-03-15 208
2 2020-04-10 26
;
data table2;
input ID SNAPSHOT_DATE :yymmdd10.;
format snapshot_Date yymmdd10.;
cards;
1 2019-12-31
1 2020-06-30
2 2020-03-31
2 2020-06-30
;
proc sql;
create table want as
select a.*, divide(sum(DIFF_DAYS),count(SNAPSHOT_DATE)) as AVG_DIFF_DAYS format=8.2
from table2 a left join table1 b
on a.id=b.id and b.date<=SNAPSHOT_DATE
group by a.id,SNAPSHOT_DATE;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.