I have two datasets, in dataset A I have 3 columns: ID, event ("Yes") and event date:
id event event_date
1 Yes 01SEP2020
1 Yes 03SEP2020
1 Yes 08SEP2020
1 Yes 10SEP2020
1 Yes 17SEP2020
1 Yes 20SEP2020
1 Yes 23SEP2020
1 Yes 26SEP2020
1 Yes 29SEP2020
1 Yes 02OCT2020
...
and in dataset B I have 3 columns: ID, visit number and visit date:
id visit visit_date
1 1 03SEP2020
1 2 17SEP2020
1 3 01OCT2020
...
I want to count how many events ("Yes") in dataset A between two visit dates in dataset B, i.e. for visit 1, number of events on or before visit 1 visit_date, for visit 2, number of event after visit 1 date and on/before visit 2 date; in the above example, I want:
id visit event_count
1 1 2
1 2 3
1 3 4
How should I count it? Merge by ID and retain visit_date?
Another way:
data WANT;
set EVENTS(in=E rename=(EVENT_DATE=DATE))
VISITS(in=V rename=(VISIT_DATE=DATE));
by DATE;
retain _V;
if VISIT then do;
output;
_V=VISIT;
N=0;
end;
else N+1;
keep ID VISIT N;
run;
ID | VISIT | N |
---|---|---|
1 | 1 | 2 |
1 | 2 | 3 |
1 | 3 | 4 |
Can you explicit what the 4 in the result is counting?
the 4 for visit #3 is counting events after visit 2 and before or on visit 3, i.e. in (17SEP2020, 01OCT2020]. In my example, there were 4 events which were after 17SEP2020 and before or on 01OCT2020:
1 Yes 20SEP2020
1 Yes 23SEP2020
1 Yes 26SEP2020
1 Yes 29SEP2020
Hello,
I had a couple of minutes spare time but could not solve quickly and need to start doing something else now.
@fengyuwuzu : people are more inclined to answer if you include your data with datalines.
Like I do below for other members that want to give it a try.
data work.A;
input id $ event $ event_date date9.;
format event_date date9.;
cards;
1 Yes 01SEP2020
1 Yes 03SEP2020
1 Yes 08SEP2020
1 Yes 10SEP2020
1 Yes 17SEP2020
1 Yes 20SEP2020
1 Yes 23SEP2020
1 Yes 26SEP2020
1 Yes 29SEP2020
1 Yes 02OCT2020
;
run;
data work.B;
input id $ visit visit_date date9.;
format visit_date date9.;
cards;
1 1 03SEP2020
1 2 17SEP2020
1 3 01OCT2020
;
run;
/* more to come */
/* end of program */
Koen
I believe this will do what you're looking for but someone else may have a more elegant solution.
data work.A;
input id $ event $ event_date date9.;
format event_date date9.;
cards;
1 Yes 01SEP2020
1 Yes 03SEP2020
1 Yes 08SEP2020
1 Yes 10SEP2020
1 Yes 17SEP2020
1 Yes 20SEP2020
1 Yes 23SEP2020
1 Yes 26SEP2020
1 Yes 29SEP2020
1 Yes 02OCT2020
;
run;
data work.B;
input id $ visit visit_date date9.;
format visit_date date9.;
cards;
1 1 03SEP2020
1 2 17SEP2020
1 3 01OCT2020
;
run;
proc transpose data=B out=B_t prefix=visit;
by ID;
var visit_date;
id visit;
run;
proc sql;
create table joined as select a.*, b.visit1, b.visit2, b.visit3
from a as a
left join b_t as b
on b.id=a.id;
quit;
data joined2;
set joined;
if event_date <= visit1 then flag1=1;
if visit1 < event_date <= visit2 then flag2=1;
if visit2 < event_date <= visit3 then flag3=1;
run;
proc sql;
create table summed as
select id, sum(flag1) as visit1, sum(flag2) as visit2, sum(flag3) as visit3
from joined2
group by id;
quit;
proc transpose data=summed out=summed_t(rename=(col1=event_count));
var visit1 visit2 visit3;
by id;
run;
data want;
set summed_t;
visit=input(substr(_name_,6),4.);
drop _name_;
run;
I figured out one way to do this:
data work.A;
input id $ event $ event_date date9.;
format event_date date9.;
cards;
1 Yes 01SEP2020
1 Yes 03SEP2020
1 Yes 08SEP2020
1 Yes 10SEP2020
1 Yes 17SEP2020
1 Yes 20SEP2020
1 Yes 23SEP2020
1 Yes 26SEP2020
1 Yes 29SEP2020
1 Yes 02OCT2020
;
run;
data work.B;
input id $ visit visit_date date9.;
format visit_date date9.;
cards;
1 1 03SEP2020
1 2 17SEP2020
1 3 01OCT2020
;
run;
proc sql;
create table test1 as
select b.id, b.visit, count(event) as count
from a as a, b as b
where a.id=b.id and event_date <=visit_date
group by b.id, visit
order by b.id, visit;
quit;
data want;
set test1;
by id visit;
count2=count-lag(count);
if first.id then count2=count;
run;
Another way:
data WANT;
set EVENTS(in=E rename=(EVENT_DATE=DATE))
VISITS(in=V rename=(VISIT_DATE=DATE));
by DATE;
retain _V;
if VISIT then do;
output;
_V=VISIT;
N=0;
end;
else N+1;
keep ID VISIT N;
run;
ID | VISIT | N |
---|---|---|
1 | 1 | 2 |
1 | 2 | 3 |
1 | 3 | 4 |
Oh I forgot to deal with IDs. Use by ID DATE; and reset all with each new ID by testing first.ID
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.