Is there something I haven't tried?
This is a super common scenario. You've got to join two different data sources by some primary key field, and the two sources store the field in different ways. The most common case is an ID field (like a customer ID) that is stored as a number in one table and as a character in another. Or another common case: ZIP code. (We can all agree that ZIP codes should be character, right? After all, some ZIP codes have leading zeros! I'm looking at you, Boston!)
Very often, one of the data sources is an Excel spreadsheet -- where "data type" is a nebulous concept and a field is imported into SAS using the wrong type.
It's not always practical to change one of the data sources to match the other, or to create a new column in one table so that it can match the key field you want to join. But there is a way to accomplish this without modifying either of your source tables: create a third table as a key lookup. You can accomplish this step, plus the subsequent join, in the Query Builder.
Let's take an example. First, I need to create some data that features the mismatched keys:
data classW;
length ID 8;
set sashelp.class(keep=name weight);
ID=_N_;
run;
data classH;
length ID $ 3;
set sashelp.class(keep=sex height);
ID=put(_N_,z3.);
run;
This results in two tables that contain different data details, and a "common" (but differently typed) key of ID:
CLASSW uses a numeric ID. CLASSH uses a 3-character ID that contains digits, formatted with leading zeros. In my opinion, the better practice is a character-based ID.
To solve, let's use the Query Builder to create a lookup table. Starting with the CLASSW data set, we'll create a two-column lookup. One column is the numeric ID in CLASSW, and the second column is a character version (called ID_Char), formatted with the leading zeros. I used put(t1.ID, Z3.) to create a Computed Column for character version. Remember, the put() function always results in a character value.
The result? A narrow data set with just the two columns:
If your data contains a lot of unique keys, you might be concerned that this new table can get really large. But I think that you'll find that in most cases, the data set won't actually occupy much storage even if you have millions of records. In my example case, the columns add up to only 8 bytes per record.
Now we can create another query in the Query Builder to join all three of these tables: the two source tables plus the lookup table. This 3-way inner join uses the lookup table as the go-between. Notice the INNER JOIN conditions listed in the Join Order section.
We'll select the fields we want to keep from each table. We don't need to keep anything from the lookup table:
And here's our successful merged result:
What does this look like in the process flow? Here's my example. Note that the only "program" code is the bit I used to produce the messy data scenario in the first place.
When you get this type of message:
The put function always fails when redifining a new variable saying a string format needs a string argument
it sounds like you are attempting to apply a format defined for character to a numeric variable. If this is a custom format and you want to format numerics then make a version of the format for use with numbers.
It may indicate you have a variable you think is character that isn't. You might want to provide more details on what you are attempting.
I didn't create the datasets, I am trying to match one database that has numbers in numeric variable types, and another that uses the same numbers in an alpha-numeric type. SAS-EG has functionality to join the tables, but only if the type matches. I want to know if there is something I can do to change variable types, without coding.
In short no. Once a variable is created in a dataset it's type is fixed.
To create a NEW dataset with a variable of the same name but different type (numeric to character);
data new;
set old (rename=(variablename=oldvar));
length variablename $ 10; /* replace 10 with the length of the variable you are going to match on in the other dataset*/
variablename = put(oldvar,format.);
drop oldvar;
run;
And use New for the match.
Or using SQL to join tables you can use something like this if the variable is being used to match records:
proc sql;
create table matched as
select dataset1,var1, dataset1.var2, dataset2.othervar1, dataset2.othervar2
from dataset1 join dataset2
on put(dataset1.variable,format.) = dataset2.variable;
run;
Yeah, this is exactly what I was doing. I was hoping for some EG functionality for this, oh well.
Thanks.
This is a super common scenario. You've got to join two different data sources by some primary key field, and the two sources store the field in different ways. The most common case is an ID field (like a customer ID) that is stored as a number in one table and as a character in another. Or another common case: ZIP code. (We can all agree that ZIP codes should be character, right? After all, some ZIP codes have leading zeros! I'm looking at you, Boston!)
Very often, one of the data sources is an Excel spreadsheet -- where "data type" is a nebulous concept and a field is imported into SAS using the wrong type.
It's not always practical to change one of the data sources to match the other, or to create a new column in one table so that it can match the key field you want to join. But there is a way to accomplish this without modifying either of your source tables: create a third table as a key lookup. You can accomplish this step, plus the subsequent join, in the Query Builder.
Let's take an example. First, I need to create some data that features the mismatched keys:
data classW;
length ID 8;
set sashelp.class(keep=name weight);
ID=_N_;
run;
data classH;
length ID $ 3;
set sashelp.class(keep=sex height);
ID=put(_N_,z3.);
run;
This results in two tables that contain different data details, and a "common" (but differently typed) key of ID:
CLASSW uses a numeric ID. CLASSH uses a 3-character ID that contains digits, formatted with leading zeros. In my opinion, the better practice is a character-based ID.
To solve, let's use the Query Builder to create a lookup table. Starting with the CLASSW data set, we'll create a two-column lookup. One column is the numeric ID in CLASSW, and the second column is a character version (called ID_Char), formatted with the leading zeros. I used put(t1.ID, Z3.) to create a Computed Column for character version. Remember, the put() function always results in a character value.
The result? A narrow data set with just the two columns:
If your data contains a lot of unique keys, you might be concerned that this new table can get really large. But I think that you'll find that in most cases, the data set won't actually occupy much storage even if you have millions of records. In my example case, the columns add up to only 8 bytes per record.
Now we can create another query in the Query Builder to join all three of these tables: the two source tables plus the lookup table. This 3-way inner join uses the lookup table as the go-between. Notice the INNER JOIN conditions listed in the Join Order section.
We'll select the fields we want to keep from each table. We don't need to keep anything from the lookup table:
And here's our successful merged result:
What does this look like in the process flow? Here's my example. Note that the only "program" code is the bit I used to produce the messy data scenario in the first place.
Thanks Chris, this works well. But what if the two tables have different id's? Won't the inner joins miss those?
Also, lets say I do a full outer join for tables that have columns of the same name. Joined by ID.
ID TYPE
001 Shirt
ID TYPE
002 PANTS
My new set will have three columns. ID t1.TYPE t2.TYPE
I just want the one column Type.
any ideas?
Check the Data->Append Table task to see if it does what you want. Append Table uses SQL OUTER UNION CORR to do its work.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.