I have two datasets.
Dataset1: age_event variable is the age at which each ID reported an event. IDs that did not reported events were not included in this dataset.
data ds1;
input ID age_event;
datalines;
a1 67
b2 89
b3 3
d2 0
;
run;
Dataset2: All IDs of the sample are reported. this dataset contains one row per ID. variable "last_agerecorded" is the at which each reported their last record.
data main;
input ID last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
run;
We would like to create an "event" variable that equals to 1 if the event was reported and the corresponding age reported on "age_event". This is from dataset ds1. If an ID did not report an event, then event=0, and "age_event" should be equal to the age under "age_lastrecorded" from the dataset main. How can we get the required output below? Thank you.
ID | age_event | event |
a1 | 67 | 1 |
a2 | 67 | 0 |
b1 | 68 | 0 |
b2 | 89 | 1 |
b3 | 3 | 1 |
c2 | 121 | 0 |
c3 | 124 | 0 |
c4 | 58 | 0 |
d1 | 89 | 0 |
d2 | 0 | 1 |
e2 | 74 | 0 |
data ds1;
input ID $ age_event;
datalines;
a1 67
b2 89
b3 3
d2 0
;
run;
data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
run;
proc sort data=ds1;
by ID;
run;
proc sort data=main;
by ID;
run;
data want;
merge main ds1(in=e);
by ID;
event=e;
age_event=coalesce(age_event,last_agerecorded);
drop last_agerecorded;
run;
proc print;run;
Bart
data ds1;
input ID $ age_event;
datalines;
a1 67
b2 89
b3 3
d2 0
;
run;
data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
run;
proc sort data=ds1;
by ID;
run;
proc sort data=main;
by ID;
run;
data want;
merge main ds1(in=e);
by ID;
event=e;
age_event=coalesce(age_event,last_agerecorded);
drop last_agerecorded;
run;
proc print;run;
Bart
Thank you so much!
This works with corrected data for your Main and Ds1 data sets. You did not indicate that ID should be read as a character value so would be all missing.
data ds1; input ID $ age_event; datalines; a1 67 b2 89 b3 3 d2 0 ; run; data main; input ID $ last_agerecorded; datalines; a1 56 a2 67 b1 68 b2 72 b3 132 c2 121 c3 124 c4 58 d1 89 d2 95 e2 74 ; run; /* requires both sets to be sorted by ID*/ data want; merge main (in=inmain) ds1 (in=inds1) ; by id; event=inds1; if inmain and not inds1 then age_event= last_agerecorded; drop last_agerecorded; run;
The Merge will align two data sets one a BY variable. However the data step doesn't work well if both sets have repeated values for the by variable in both sets.
The IN= option creates a true/false (1/0) temporary variable that indicates whether the current observation has values from that set. So you can test conditionally for the case of observations in Main but not Ds1 as shown.
Sine the Event variable is basically that it came from DS1 that is using the Inds1 to set the Event variable.
data ds1;
input ID $ age_event;
datalines;
a1 67
b2 89
b3 3
d2 0
;
run;
data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
run;
proc sql;
create table want as
select ID
, age_event
, 1 as event
from ds1
outer union corresponding
select ID
, last_agerecorded as age_event
, 0 as event
from main
where ID not in(select distinct ID from ds1)
order by ID;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.