I have two datasets, in dataset A I have 3 columns: ID, event ("Yes") and event date:
id event event_date
1 Yes 01SEP2020
1 Yes 03SEP2020
1 Yes 08SEP2020
1 Yes 10SEP2020
1 Yes 17SEP2020
1 Yes 20SEP2020
1 Yes 23SEP2020
1 Yes 26SEP2020
1 Yes 29SEP2020
1 Yes 02OCT2020
...
and in dataset B I have 3 columns: ID, visit number and visit date:
id visit visit_date
1 1 03SEP2020
1 2 17SEP2020
1 3 01OCT2020
...
I want to count how many events ("Yes") in dataset A between two visit dates in dataset B, i.e. for visit 1, number of events on or before visit 1 visit_date, for visit 2, number of event after visit 1 date and on/before visit 2 date; in the above example, I want:
id visit event_count
1 1 2
1 2 3
1 3 4
How should I count it? Merge by ID and retain visit_date?
Another way:
data WANT;
set EVENTS(in=E rename=(EVENT_DATE=DATE))
VISITS(in=V rename=(VISIT_DATE=DATE));
by DATE;
retain _V;
if VISIT then do;
output;
_V=VISIT;
N=0;
end;
else N+1;
keep ID VISIT N;
run;
ID | VISIT | N |
---|---|---|
1 | 1 | 2 |
1 | 2 | 3 |
1 | 3 | 4 |
Can you explicit what the 4 in the result is counting?
the 4 for visit #3 is counting events after visit 2 and before or on visit 3, i.e. in (17SEP2020, 01OCT2020]. In my example, there were 4 events which were after 17SEP2020 and before or on 01OCT2020:
1 Yes 20SEP2020
1 Yes 23SEP2020
1 Yes 26SEP2020
1 Yes 29SEP2020
Hello,
I had a couple of minutes spare time but could not solve quickly and need to start doing something else now.
@fengyuwuzu : people are more inclined to answer if you include your data with datalines.
Like I do below for other members that want to give it a try.
data work.A;
input id $ event $ event_date date9.;
format event_date date9.;
cards;
1 Yes 01SEP2020
1 Yes 03SEP2020
1 Yes 08SEP2020
1 Yes 10SEP2020
1 Yes 17SEP2020
1 Yes 20SEP2020
1 Yes 23SEP2020
1 Yes 26SEP2020
1 Yes 29SEP2020
1 Yes 02OCT2020
;
run;
data work.B;
input id $ visit visit_date date9.;
format visit_date date9.;
cards;
1 1 03SEP2020
1 2 17SEP2020
1 3 01OCT2020
;
run;
/* more to come */
/* end of program */
Koen
I believe this will do what you're looking for but someone else may have a more elegant solution.
data work.A;
input id $ event $ event_date date9.;
format event_date date9.;
cards;
1 Yes 01SEP2020
1 Yes 03SEP2020
1 Yes 08SEP2020
1 Yes 10SEP2020
1 Yes 17SEP2020
1 Yes 20SEP2020
1 Yes 23SEP2020
1 Yes 26SEP2020
1 Yes 29SEP2020
1 Yes 02OCT2020
;
run;
data work.B;
input id $ visit visit_date date9.;
format visit_date date9.;
cards;
1 1 03SEP2020
1 2 17SEP2020
1 3 01OCT2020
;
run;
proc transpose data=B out=B_t prefix=visit;
by ID;
var visit_date;
id visit;
run;
proc sql;
create table joined as select a.*, b.visit1, b.visit2, b.visit3
from a as a
left join b_t as b
on b.id=a.id;
quit;
data joined2;
set joined;
if event_date <= visit1 then flag1=1;
if visit1 < event_date <= visit2 then flag2=1;
if visit2 < event_date <= visit3 then flag3=1;
run;
proc sql;
create table summed as
select id, sum(flag1) as visit1, sum(flag2) as visit2, sum(flag3) as visit3
from joined2
group by id;
quit;
proc transpose data=summed out=summed_t(rename=(col1=event_count));
var visit1 visit2 visit3;
by id;
run;
data want;
set summed_t;
visit=input(substr(_name_,6),4.);
drop _name_;
run;
I figured out one way to do this:
data work.A;
input id $ event $ event_date date9.;
format event_date date9.;
cards;
1 Yes 01SEP2020
1 Yes 03SEP2020
1 Yes 08SEP2020
1 Yes 10SEP2020
1 Yes 17SEP2020
1 Yes 20SEP2020
1 Yes 23SEP2020
1 Yes 26SEP2020
1 Yes 29SEP2020
1 Yes 02OCT2020
;
run;
data work.B;
input id $ visit visit_date date9.;
format visit_date date9.;
cards;
1 1 03SEP2020
1 2 17SEP2020
1 3 01OCT2020
;
run;
proc sql;
create table test1 as
select b.id, b.visit, count(event) as count
from a as a, b as b
where a.id=b.id and event_date <=visit_date
group by b.id, visit
order by b.id, visit;
quit;
data want;
set test1;
by id visit;
count2=count-lag(count);
if first.id then count2=count;
run;
Another way:
data WANT;
set EVENTS(in=E rename=(EVENT_DATE=DATE))
VISITS(in=V rename=(VISIT_DATE=DATE));
by DATE;
retain _V;
if VISIT then do;
output;
_V=VISIT;
N=0;
end;
else N+1;
keep ID VISIT N;
run;
ID | VISIT | N |
---|---|---|
1 | 1 | 2 |
1 | 2 | 3 |
1 | 3 | 4 |
Oh I forgot to deal with IDs. Use by ID DATE; and reset all with each new ID by testing first.ID
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.