- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have two datasets.
Dataset1: age_event variable is the age at which each ID reported an event. IDs that did not reported events were not included in this dataset.
data ds1;
input ID age_event;
datalines;
a1 67
b2 89
b3 3
d2 0
;
run;
Dataset2: All IDs of the sample are reported. this dataset contains one row per ID. variable "last_agerecorded" is the at which each reported their last record.
data main;
input ID last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
run;
We would like to create an "event" variable that equals to 1 if the event was reported and the corresponding age reported on "age_event". This is from dataset ds1. If an ID did not report an event, then event=0, and "age_event" should be equal to the age under "age_lastrecorded" from the dataset main. How can we get the required output below? Thank you.
ID | age_event | event |
a1 | 67 | 1 |
a2 | 67 | 0 |
b1 | 68 | 0 |
b2 | 89 | 1 |
b3 | 3 | 1 |
c2 | 121 | 0 |
c3 | 124 | 0 |
c4 | 58 | 0 |
d1 | 89 | 0 |
d2 | 0 | 1 |
e2 | 74 | 0 |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data ds1;
input ID $ age_event;
datalines;
a1 67
b2 89
b3 3
d2 0
;
run;
data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
run;
proc sort data=ds1;
by ID;
run;
proc sort data=main;
by ID;
run;
data want;
merge main ds1(in=e);
by ID;
event=e;
age_event=coalesce(age_event,last_agerecorded);
drop last_agerecorded;
run;
proc print;run;
Bart
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug
"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings
SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data ds1;
input ID $ age_event;
datalines;
a1 67
b2 89
b3 3
d2 0
;
run;
data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
run;
proc sort data=ds1;
by ID;
run;
proc sort data=main;
by ID;
run;
data want;
merge main ds1(in=e);
by ID;
event=e;
age_event=coalesce(age_event,last_agerecorded);
drop last_agerecorded;
run;
proc print;run;
Bart
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug
"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings
SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This works with corrected data for your Main and Ds1 data sets. You did not indicate that ID should be read as a character value so would be all missing.
data ds1; input ID $ age_event; datalines; a1 67 b2 89 b3 3 d2 0 ; run; data main; input ID $ last_agerecorded; datalines; a1 56 a2 67 b1 68 b2 72 b3 132 c2 121 c3 124 c4 58 d1 89 d2 95 e2 74 ; run; /* requires both sets to be sorted by ID*/ data want; merge main (in=inmain) ds1 (in=inds1) ; by id; event=inds1; if inmain and not inds1 then age_event= last_agerecorded; drop last_agerecorded; run;
The Merge will align two data sets one a BY variable. However the data step doesn't work well if both sets have repeated values for the by variable in both sets.
The IN= option creates a true/false (1/0) temporary variable that indicates whether the current observation has values from that set. So you can test conditionally for the case of observations in Main but not Ds1 as shown.
Sine the Event variable is basically that it came from DS1 that is using the Inds1 to set the Event variable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data ds1;
input ID $ age_event;
datalines;
a1 67
b2 89
b3 3
d2 0
;
run;
data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
run;
proc sql;
create table want as
select ID
, age_event
, 1 as event
from ds1
outer union corresponding
select ID
, last_agerecorded as age_event
, 0 as event
from main
where ID not in(select distinct ID from ds1)
order by ID;
quit;