In SAS I have two datasets.
Dataset1: age_event variable is the age at which each ID reported an event. One same ID can have one or more than one event. Not all IDs from the sample reported events. IDs that did not reported events were not included in this dataset.
data ds1;
input ID age_event;
datalines;
a1	32
b2	54
b2	67
c3	34
c3	45
c3	78
;
run;
Dataset2: All IDs of the sample are reported. This dataset contains one row per ID. variable "last_agerecorded" is the age at which each reported their last record for the whole study.
data main;
input ID last_agerecorded;
datalines;
a1	56
a2	67
b1	68
b2	72
b3	132
c2	121
c3	124
c4	58
d1	89
d2	95
e2	74
;
run;We would like to create a count_event variable that counts the number of events per ID. However, if an ID reports more than one event, count_event variable will need to add the total of events up to the last "age_event" reported. See cases for ID=b2 and c3 below. Each event has one row and the "count_event" variable sums the number of event at the last age_event. If the ID did not report any event at all, then age_event should be equal to "last_agerecorded" from the main dataset and the count_event should be equal to zero.
This is the required output:
| ID | age_event | count_event | 
| a1 | 32 | 1 | 
| a2 | 67 | 0 | 
| b1 | 68 | 0 | 
| b2 | 54 | 1 | 
| b2 | 67 | 2 | 
| b3 | 132 | 0 | 
| c2 | 121 | 0 | 
| c3 | 34 | 1 | 
| c3 | 45 | 2 | 
| c3 | 78 | 3 | 
| c4 | 58 | 0 | 
| d1 | 89 | 0 | 
| d2 | 95 | 0 | 
| e2 | 74 | 0 | 
How can we get the required output? Thanks!
Assuming the data are sorted by ID (main) or ID/age_event (DS1), you can use the interleaving process of SET with BY, as in:
data ds1;
input ID $ age_event;
datalines;
a1 32
b2 54
b2 67
c3 34
c3 45
c3 78
;
run;
data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
data want;
  set main (rename=last_agerecorded=age_event) ds1 (in=inds1);
  by id;
  if first.id then count=0;
  count+inds1;
  if inds1=1 or last.id;
run;data ds1;
input ID $ age_event;
datalines;
a1 32
b2 54
b2 67
c3 34
c3 45
c3 78
;
run;
data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
run;
proc sort data=ds1;
 by ID age_event;
run;
data ds1;
 set ds1; 
  by ID;
   if first.ID then count_event = 0;
    count_event + 1;
run;
proc sql;
 create table want as 
  select ID
  ,	     age_event
  ,	     count_event
  from ds1
   outer union corresponding
  select ID
  ,	     last_agerecorded as age_event
  ,      0 as count_event
  from main
  where ID not in(select distinct ID from ds1)
 order by ID, count_event;
quit;
Assuming the data are sorted by ID (main) or ID/age_event (DS1), you can use the interleaving process of SET with BY, as in:
data ds1;
input ID $ age_event;
datalines;
a1 32
b2 54
b2 67
c3 34
c3 45
c3 78
;
run;
data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
data want;
  set main (rename=last_agerecorded=age_event) ds1 (in=inds1);
  by id;
  if first.id then count=0;
  count+inds1;
  if inds1=1 or last.id;
run;It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
