In SAS I have two datasets.
Dataset1: age_event variable is the age at which each ID reported an event. One same ID can have one or more than one event. Not all IDs from the sample reported events. IDs that did not reported events were not included in this dataset.
data ds1;
input ID age_event;
datalines;
a1 32
b2 54
b2 67
c3 34
c3 45
c3 78
;
run;
Dataset2: All IDs of the sample are reported. This dataset contains one row per ID. variable "last_agerecorded" is the age at which each reported their last record for the whole study.
data main;
input ID last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
run;
We would like to create a count_event variable that counts the number of events per ID. However, if an ID reports more than one event, count_event variable will need to add the total of events up to the last "age_event" reported. See cases for ID=b2 and c3 below. Each event has one row and the "count_event" variable sums the number of event at the last age_event. If the ID did not report any event at all, then age_event should be equal to "last_agerecorded" from the main dataset and the count_event should be equal to zero.
This is the required output:
ID | age_event | count_event |
a1 | 32 | 1 |
a2 | 67 | 0 |
b1 | 68 | 0 |
b2 | 54 | 1 |
b2 | 67 | 2 |
b3 | 132 | 0 |
c2 | 121 | 0 |
c3 | 34 | 1 |
c3 | 45 | 2 |
c3 | 78 | 3 |
c4 | 58 | 0 |
d1 | 89 | 0 |
d2 | 95 | 0 |
e2 | 74 | 0 |
How can we get the required output? Thanks!
Assuming the data are sorted by ID (main) or ID/age_event (DS1), you can use the interleaving process of SET with BY, as in:
data ds1;
input ID $ age_event;
datalines;
a1 32
b2 54
b2 67
c3 34
c3 45
c3 78
;
run;
data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
data want;
set main (rename=last_agerecorded=age_event) ds1 (in=inds1);
by id;
if first.id then count=0;
count+inds1;
if inds1=1 or last.id;
run;
data ds1;
input ID $ age_event;
datalines;
a1 32
b2 54
b2 67
c3 34
c3 45
c3 78
;
run;
data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
run;
proc sort data=ds1;
by ID age_event;
run;
data ds1;
set ds1;
by ID;
if first.ID then count_event = 0;
count_event + 1;
run;
proc sql;
create table want as
select ID
, age_event
, count_event
from ds1
outer union corresponding
select ID
, last_agerecorded as age_event
, 0 as count_event
from main
where ID not in(select distinct ID from ds1)
order by ID, count_event;
quit;
Assuming the data are sorted by ID (main) or ID/age_event (DS1), you can use the interleaving process of SET with BY, as in:
data ds1;
input ID $ age_event;
datalines;
a1 32
b2 54
b2 67
c3 34
c3 45
c3 78
;
run;
data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
data want;
set main (rename=last_agerecorded=age_event) ds1 (in=inds1);
by id;
if first.id then count=0;
count+inds1;
if inds1=1 or last.id;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.