data have1;
infile datalines;
input id year death reg;
datalines;
1 2000 0 11
1 2001 0 12
1 2002 0 11
1 2003 0 11
3 2000 0 .
3 2001 0 11
3 2002 0 11
3 2003 0 11
5 2000 1 .
5 2001 1 .
5 2002 1 11
5 2003 1 11
;run;
proc print data=have1;
run;
data have2;
infile datalines;
input id year reg death sex inc;
datalines;
1 2000 11 0 1 1000
2 2000 12 0 1 2000
3 2000 13 0 2 3000
4 2000 12 0 1 4000
5 2000 13 1 2 3900
; run;
proc print data=have2;
run;
data want;
infile datalines;
input id year death residence sex inc
1 2000 0 11 1 1000
1 2001 0 12 1 1000
1 2002 0 11 1 1000
1 2003 0 11 1 1000
2 2000 0 12 1 2000
2 2001 0 . 1 2000
2 2002 0 . 1 2000
2 2003 0 . 1 2000
3 2000 0 13 2 3000
3 2001 0 11 2 3000
3 2002 0 11 2 3000
3 2003 0 11 2 3000
4 2000 0 12 1 4000
4 2001 0 . 1 4000
4 2002 0 . 1 4000
4 2003 0 . 1 4000
5 2000 1 13 2 3900
5 2001 1 . 2 3900
5 2002 1 11 2 3900
5 2003 1 11 2 3900
;
run;
proc print data=want;
run;
I'd like to merge have1 and have 2 datasets into a file that looks like want.
How can I do that?
Thank you so much in advance.
data have1;
input id year death reg;
datalines;
1 2000 0 11
1 2001 0 12
1 2002 0 11
1 2003 0 11
3 2000 0 .
3 2001 0 11
3 2002 0 11
3 2003 0 11
5 2000 1 .
5 2001 1 .
5 2002 1 11
5 2003 1 11
;
data have2;
infile datalines;
input id year reg death sex inc;
datalines;
1 2000 11 0 1 1000
2 2000 12 0 1 2000
3 2000 13 0 2 3000
4 2000 12 0 1 4000
5 2000 13 1 2 3900
;
data want;
merge have1 have2;
by id;
run;
DATA step merge would work if the key variable isn't unique, SQL procedure will:
data have1;
infile datalines;
input id year death reg;
datalines;
1 2000 0 11
1 2001 0 12
1 2002 0 11
1 2003 0 11
3 2000 0 .
3 2001 0 11
3 2002 0 11
3 2003 0 11
5 2000 1 .
5 2001 1 .
5 2002 1 11
5 2003 1 11
;
run;
data have2;
infile datalines;
input id year reg death sex inc;
datalines;
1 2000 11 0 1 1000
2 2000 12 0 1 2000
3 2000 13 0 2 3000
4 2000 12 0 1 4000
5 2000 13 1 2 3900
;
run;
proc sql;
create table want as
select distinct coalesce(have1.id,have2.id) as id,coalesce(have1.year,have2.year) as year,coalesce(have1.death,have2.death) as death,have1.reg,have2.sex,have2.inc
from have1 full join have2 on have1.id = have2.id;
quit;
Please explain the logic!
It seems that you want each id four times in "want". What should happen, if more than four obs exist for one id in "have1"?
EDIT: You need to check the code you have posted, there is an error in the step creating "have1": Five variables in the input statement, but only four in the datalines.
@andreas_lds agree. I should have pointed that out. I think the data should be as in my snippet
Thanks for your question.
It's not the case that I have more than 4 obs in have 1 since I made sure that there are no more than four obs. (I eliminated years out of the range).
I want to create all the four years (four rows) for each id in have 2. Also, unless there is information in death and residence in have 1, I want to use information for deaths, sex, and inc, but not residence.
I hope that my explanation helps to figure out the logic. Please let me know if it's still unclear.
Thank you very much.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.