BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PatrykSAS
Obsidian | Level 7

Hi. I would like to calculate how many days on average pass between contact with the client in terms of snapshots. I will use the following example tables (The DIFF_DAYS column means how many days have passed from the previous date in the given id)

 

Table 1

ID      DATE                 DIFF_DAYS

1       2019-05-17         .

1       2019-10-06        142

1       2020-01-15        101

1       2020-02-23         39

2       2019-05-07         .

2       2019-08-20        105

2       2020-03-15        208

2       2020-04-10        26

 

And snapshot table (snapshots for each ID can be different)

Table 2

ID  SNAPSHOT_DATE

1    2019-12-31

1    2020-06-30

2    2020-03-31

2    2020-06-30

 

All I want to achieve is a table simiar to this (AVG_DIFF_DAYS is always the average from the first date to the last included in the snapshot date)

ID  SNAPSHOT_DATE    AVG_DIFF_DAYS

1    2019-12-31                71

1    2020-06-30                70,5

2    2020-03-31                104,33

2    2020-06-30                 84,75

 

The task is not very easy. How can I solve it?

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

Hi @PatrykSAS  Can i assume all ID's in table1 are present in table2 ? if yes, it's pretty straight forward-

 

 


data table1;
 input ID      DATE :yymmdd10.   DIFF_DAYS;
 format date yymmdd10.;
 cards;
1       2019-05-17         .

1       2019-10-06        142

1       2020-01-15        101

1       2020-02-23         39

2       2019-05-07         .

2       2019-08-20        105

2       2020-03-15        208

2       2020-04-10        26
;

data table2;
 input ID  SNAPSHOT_DATE :yymmdd10.;
 format snapshot_Date yymmdd10.;
 cards; 
1    2019-12-31
1    2020-06-30
2    2020-03-31
2    2020-06-30
;


data want;
 if _n_=1 then do;
  dcl hash H (multidata:'y') ;
   h.definekey  ("id") ;
   h.definedata ("date", "DIFF_DAYS") ;
   h.definedone () ;
 end;
 do _n_=h.clear() by 0 until(last.id);
  set table1;
  by id;
  h.add();
 end;
 do until(last.id);
  set table2;
  by id;
  call missing(_s,_n);
  do _n_=h.find() by 0 while(_n_=0);
   if date<=SNAPSHOT_DATE  then do;
	 _n=sum(_n,1);
	 _s=sum(_s,diff_days,0);
   end;
   _n_=h.find_next();
  end;
  AVG_DIFF_DAYS=divide(_s,_n);
  output;
 end;
 format AVG_DIFF_DAYS 8.2;
 keep id SNAPSHOT_DATE AVG_DIFF_DAYS;
run;

proc print noobs;run;
ID SNAPSHOT_DATE AVG_DIFF_DAYS
1 2019-12-31 71.00
1 2020-06-30 70.50
2 2020-03-31 104.33
2 2020-06-30 84.75

 

View solution in original post

3 REPLIES 3
novinosrin
Tourmaline | Level 20

Hi @PatrykSAS  Can i assume all ID's in table1 are present in table2 ? if yes, it's pretty straight forward-

 

 


data table1;
 input ID      DATE :yymmdd10.   DIFF_DAYS;
 format date yymmdd10.;
 cards;
1       2019-05-17         .

1       2019-10-06        142

1       2020-01-15        101

1       2020-02-23         39

2       2019-05-07         .

2       2019-08-20        105

2       2020-03-15        208

2       2020-04-10        26
;

data table2;
 input ID  SNAPSHOT_DATE :yymmdd10.;
 format snapshot_Date yymmdd10.;
 cards; 
1    2019-12-31
1    2020-06-30
2    2020-03-31
2    2020-06-30
;


data want;
 if _n_=1 then do;
  dcl hash H (multidata:'y') ;
   h.definekey  ("id") ;
   h.definedata ("date", "DIFF_DAYS") ;
   h.definedone () ;
 end;
 do _n_=h.clear() by 0 until(last.id);
  set table1;
  by id;
  h.add();
 end;
 do until(last.id);
  set table2;
  by id;
  call missing(_s,_n);
  do _n_=h.find() by 0 while(_n_=0);
   if date<=SNAPSHOT_DATE  then do;
	 _n=sum(_n,1);
	 _s=sum(_s,diff_days,0);
   end;
   _n_=h.find_next();
  end;
  AVG_DIFF_DAYS=divide(_s,_n);
  output;
 end;
 format AVG_DIFF_DAYS 8.2;
 keep id SNAPSHOT_DATE AVG_DIFF_DAYS;
run;

proc print noobs;run;
ID SNAPSHOT_DATE AVG_DIFF_DAYS
1 2019-12-31 71.00
1 2020-06-30 70.50
2 2020-03-31 104.33
2 2020-06-30 84.75

 

novinosrin
Tourmaline | Level 20

@PatrykSAS  Easy and boring SQL-

data table1;
 input ID      DATE :yymmdd10.   DIFF_DAYS;
 format date yymmdd10.;
 cards;
1       2019-05-17         .

1       2019-10-06        142

1       2020-01-15        101

1       2020-02-23         39

2       2019-05-07         .

2       2019-08-20        105

2       2020-03-15        208

2       2020-04-10        26
;

data table2;
 input ID  SNAPSHOT_DATE :yymmdd10.;
 format snapshot_Date yymmdd10.;
 cards; 
1    2019-12-31
1    2020-06-30
2    2020-03-31
2    2020-06-30
;

proc sql;
 create table want as
 select a.*, divide(sum(DIFF_DAYS),count(SNAPSHOT_DATE)) as AVG_DIFF_DAYS format=8.2
 from table2 a left join table1 b
 on a.id=b.id and b.date<=SNAPSHOT_DATE
 group by a.id,SNAPSHOT_DATE;
quit;
PatrykSAS
Obsidian | Level 7
Nice and easy. Thank you a lot!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 448 views
  • 1 like
  • 2 in conversation