I'm using SAS 9.3 and trying to merge two datasets. In dataset1, my merge key variable called "id" is originally formatted as a character $200. In dataset2, the merge key variable also called "id" is formatted as a character $6. The values of id are actually numbers (for example, 001004 or 134875).
I have tried the following code that reformats the id in dataset1 and tries to perform the merge:
data dataset1; set dataset1;
newid=input(id,BEST12.);
newid2=put(newid,z6.);
run;
proc sql;
create table combo as
select a.*, b.var1, b.var2
from dataset2 as a left join dataset1 as b
on a.id = b.newid2;
quit;
run;
The above code appears to work at first, but fails to fully merge all the records in dataset1. The merge works correctly for low values of id, but fails when there are no longer any leading zeros. As a specific example, the merge works for id=099999 but not for id=100000. How can I format my id variable to make sure all records are appropriately merged?
Why not convert the two variables to numeric in the SQL call?
proc sql;
create table combo as
select
dataset2.*,
dataset1.var1,
dataset1.var2
from
dataset2 left join dataset1
on input(dataset2.id,best6.) = input(left(dataset1.id),best12.)
;
quit;
run;
And be careful with those SQL aliases. Everybody uses them, but they very often make the code harder to read - in this case dataset1 was mapped to "b" and dataset2 to "a", very confusing, IMO. It is not that hard to use a change command to get rid of the aliases in the final code, and it normally makes everything much easier to read.
Check the lengths on your variables to see if they’re the same and use STRIP to remove any leading/trailing spaces.
If if the other variable had a length of $200, perhaps the following may work:
Length newID $6.;
NewID = ID;
You’ll have to make the names match but hopefully that gives you the idea.
Check the lengths on your variables to see if they’re the same and use STRIP to remove any leading/trailing spaces.
If if the other variable had a length of $200, perhaps the following may work:
Length newID $6.;
NewID = ID;
You’ll have to make the names match but hopefully that gives you the idea.
If both ID's are character then you don't need a transformation. Removing leading blanks should be enough:
proc sql;
create table combo as
select a.*, b.var1, b.var2
from dataset2 as a left join dataset1 as b
on left(a.id) = left(b.id);
quit;
Why not convert the two variables to numeric in the SQL call?
proc sql;
create table combo as
select
dataset2.*,
dataset1.var1,
dataset1.var2
from
dataset2 left join dataset1
on input(dataset2.id,best6.) = input(left(dataset1.id),best12.)
;
quit;
run;
And be careful with those SQL aliases. Everybody uses them, but they very often make the code harder to read - in this case dataset1 was mapped to "b" and dataset2 to "a", very confusing, IMO. It is not that hard to use a change command to get rid of the aliases in the final code, and it normally makes everything much easier to read.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.