data have1;
input id year death reg;
datalines;
1 2000 0 11
1 2001 0 12
1 2002 0 11
1 2003 0 11
2 2000 1 11
2 2001 1 12
2 2002 1 13
2 2003 1 14
3 2000 0 .
3 2001 0 11
3 2002 0 11
3 2003 0 11
5 2000 1 .
5 2001 1 .
5 2002 1 11
5 2003 1 11
;
data have2;
input id year reg sex incs;
datalines;
1 2000 11 1 1000
2 2000 11 1 2000
4 2000 12 1 4000
;data want;
input id year reg death sex incs;
datalines;
1 2000 11 0 1 1000
1 2001 12 0 1 1000
1 2002 11 0 1 1000
1 2003 11 0 1 1000
2 2000 11 1 1 2000
2 2001 12 1 1 2000
2 2002 13 1 1 2000
2 2003 14 1 1 2000
4 2000 12 . 1 4000
;I'd like to have a dataset like want.
The logic is that I want to keep IDs only from have2, but want to keep all person-year information for the IDs in have1. Except for the reg variable, sex and incs are time-invariant information, so I want to have the same values for all years, but in case of reg it's all changing over years, so I want to keep information from have1. Meanwhile, there may be some IDs only shown in have2, in that case, I want to keep only the first row (as shown in have2) not making other years.
Thank you so much in advance.
According to ID=4 wanted output it seems that you select DEATH only from have1 and neglect it from have2. Try next code:
proc sql;
create table want as
select a.ID, a.year, a.reg, a. sex, a.incs,
b.death
from have2 as a
left join have1 as b
on a.ID = b.ID;
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.