Dear All,
I have the following two datasets (both sorted with respect to the variables id and date):
data db1;
input id date V1 V2;
datalines;
1 20110902 0.5 2.1
2 20110903 0.7 2.8
;
data db2;
input id date V3;
datalines;
1 20110901 10
1 20110902 20
1 20110903 30
2 20110901 60
2 20110902 50
2 20110903 40
;
I would like to obtain the following dataset db3:
data db3;
input id date V1 V2 V3;
datalines;
1 20110902 0.5 2.1 20
2 20110903 0.7 2.8 40
;
That is, I would like to add the variable V3 to the first dataset db1 when the values of the variables ID and DATE in db1 coincide with those in db2. Hence, the third dataset must have the size of the first one.
I have done two attempts. The first one involves PROC SQL
proc sql;
create table db3
as select db1.*, db2.V3
from db1, db2
where db1.id = db2.id and db1.date = db2.date;
quit;
The second one involves a simple merge with respect to both key variables
data db3;
merge db1 db2;
by id date;
run;
Any help would be highly appreciated.
Not sure, may be not understood properly. Your own syntax for sql can generate what you are trying to get.
proc sql;
create table db3
as select db1.*, db2.V3
from db1, db2
where db1.id = db2.id and db1.date = db2.date;
quit;
Please try
data db3;
merge db1(in=a) db2(in=b);
by id date;
if a;
run;
Thanks,
Jag
And an SQL
proc sql;
create table db3
as select db1.*, db2.V3
from db1 left join db2
on db1.id = db2.id and db1.date = db2.date;
quit;
Not sure, may be not understood properly. Your own syntax for sql can generate what you are trying to get.
proc sql;
create table db3
as select db1.*, db2.V3
from db1, db2
where db1.id = db2.id and db1.date = db2.date;
quit;
Thank you guys for all your kind replies.
Unfortunately, your solution doesn't allow me to get the desired result. I have found a potential solution in this discussion where you replied years ago:
data db3;
merge db1 db2;
by id date;
if not missing (V1) then output;
call missing(V1);
run;
, I obtain the number of rows of db2, instead of db1.
@stat@sas You are right. This solution works for my reproducible example. However, it does not for my real dataset.
Hi Mark,
Thanks for your response
i just checked the code i suggested to you and i see it is producing your expected output. like it is generating the same number of observations similar to db1 by matching on id and date variables. could you please again check and correct me.
data db1;
input id date V1 V2;
datalines;
1 20110902 0.5 2.1
2 20110903 0.7 2.8
;
data db2;
input id date V3;
datalines;
1 20110901 10
1 20110902 20
1 20110903 30
2 20110901 60
2 20110902 50
2 20110903 40
;
proc sort data=db1;
by id date;
run;
proc sort data=db2;
by id date;
run;
data db3;
merge db1(in=a) db2(in=b);
by id date;
if a;
run;
Thanks,
Jag
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.