Desktop productivity for business analysts and programmers

SAS-EG 6.1 useless for variable type changes?

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

SAS-EG 6.1 useless for variable type changes?

[ Edited ]
  • Whenever I try to match an alpha-numeric variable with a numeric variable from other datasets, there seems to be no work around. There is no easy interface for changing the variable type. The put function always fails when redifining a new variable saying a string format needs a string argument. I simply must revert back to coding to get these things done.

 

Is there something I haven't tried?


Accepted Solutions
Solution
‎02-09-2016 12:59 PM
Community Manager
Posts: 2,891

Re: SAS-EG 6.1 useless for variable type changes?

This is a super common scenario.  You've got to join two different data sources by some primary key field, and the two sources store the field in different ways.  The most common case is an ID field (like a customer ID) that is stored as a number in one table and as a character in another.  Or another common case: ZIP code.  (We can all agree that ZIP codes should be character, right? After all, some ZIP codes have leading zeros! I'm looking at you, Boston!)

 

Very often, one of the data sources is an Excel spreadsheet -- where "data type" is a nebulous concept and a field is imported into SAS using the wrong type.

 

It's not always practical to change one of the data sources to match the other, or to create a new column in one table so that it can match the key field you want to join.  But there is a way to accomplish this without modifying either of your source tables: create a third table as a key lookup.  You can accomplish this step, plus the subsequent join, in the Query Builder.

 

Let's take an example.  First, I need to create some data that features the mismatched keys:

 

data classW;
 length ID 8;
 set sashelp.class(keep=name weight);
 ID=_N_;
run;

data classH;
 length ID $ 3;
 set sashelp.class(keep=sex height);
 ID=put(_N_,z3.);
run;

 

This results in two tables that contain different data details, and a "common" (but differently typed) key of ID:

baddata.png

 

CLASSW uses a numeric ID.  CLASSH uses a 3-character ID that contains digits, formatted with leading zeros. In my opinion, the better practice is a character-based ID.

 

To solve, let's use the Query Builder to create a lookup table.  Starting with the CLASSW data set, we'll create a two-column lookup.  One column is the numeric ID in CLASSW, and the second column is a character version (called ID_Char), formatted with the leading zeros.  I used put(t1.ID, Z3.) to create a Computed Column for character version.  Remember, the put() function always results in a character value.

 

computed_char.png

 

The result? A narrow data set with just the two columns:

lookup.png

 

If your data contains a lot of unique keys, you might be concerned that this new table can get really large.  But I think that you'll find that in most cases, the data set won't actually occupy much storage even if you have millions of records.  In my example case, the columns add up to only 8 bytes per record.

 

Now we can create another query in the Query Builder to join all three of these tables: the two source tables plus the lookup table.  This 3-way inner join uses the lookup table as the go-between.  Notice the INNER JOIN conditions listed in the Join Order section.

 

threeway.png

 

We'll select the fields we want to keep from each table.  We don't need to keep anything from the lookup table:

qselect.png

 

And here's our successful merged result:

result.png

 

What does this look like in the process flow?  Here's my example.  Note that the only "program" code is the bit I used to produce the messy data scenario in the first place.

 

pf.png

 

 

View solution in original post


All Replies
Super User
Posts: 11,144

Re: SAS-EG 6.1 useless for variable type changes?

When you get this type of message:

 

The put function always fails when redifining a new variable saying a string format needs a string argument

 

it sounds like you are attempting to apply a format defined for character to a numeric variable. If this is a custom format and you want to format numerics then make a version of the format for use with numbers.

 

It may indicate you have a variable you think is character that isn't. You might want to provide more details on what you are attempting.

New Contributor
Posts: 4

Re: SAS-EG 6.1 useless for variable type changes?

I didn't create the datasets, I am trying to match one database that has numbers in numeric variable types, and another that uses the same numbers in an alpha-numeric type. SAS-EG has functionality to join the tables, but only if the type matches. I want to know if there is something I can do to change variable types, without coding.

Super User
Posts: 11,144

Re: SAS-EG 6.1 useless for variable type changes?

In short no. Once a variable is created in a dataset it's type is fixed.

To create a NEW dataset with a variable of the same name but different type (numeric to character);

 

data new;

   set old (rename=(variablename=oldvar));

   length variablename $ 10; /* replace 10 with the length of the variable you are going to match on in the other dataset*/

   variablename = put(oldvar,format.);

   drop oldvar;

run;

 

And use New for the match.

 

Or using SQL to join tables you can use something like this if the variable is being used to match records:

 

proc sql;

   create table matched as

   select dataset1,var1, dataset1.var2, dataset2.othervar1, dataset2.othervar2

   from dataset1 join dataset2

         on put(dataset1.variable,format.) = dataset2.variable;

run;

New Contributor
Posts: 4

Re: SAS-EG 6.1 useless for variable type changes?

Yeah, this is exactly what I was doing. I was hoping for some EG functionality for this, oh well.

 

Thanks.

Solution
‎02-09-2016 12:59 PM
Community Manager
Posts: 2,891

Re: SAS-EG 6.1 useless for variable type changes?

This is a super common scenario.  You've got to join two different data sources by some primary key field, and the two sources store the field in different ways.  The most common case is an ID field (like a customer ID) that is stored as a number in one table and as a character in another.  Or another common case: ZIP code.  (We can all agree that ZIP codes should be character, right? After all, some ZIP codes have leading zeros! I'm looking at you, Boston!)

 

Very often, one of the data sources is an Excel spreadsheet -- where "data type" is a nebulous concept and a field is imported into SAS using the wrong type.

 

It's not always practical to change one of the data sources to match the other, or to create a new column in one table so that it can match the key field you want to join.  But there is a way to accomplish this without modifying either of your source tables: create a third table as a key lookup.  You can accomplish this step, plus the subsequent join, in the Query Builder.

 

Let's take an example.  First, I need to create some data that features the mismatched keys:

 

data classW;
 length ID 8;
 set sashelp.class(keep=name weight);
 ID=_N_;
run;

data classH;
 length ID $ 3;
 set sashelp.class(keep=sex height);
 ID=put(_N_,z3.);
run;

 

This results in two tables that contain different data details, and a "common" (but differently typed) key of ID:

baddata.png

 

CLASSW uses a numeric ID.  CLASSH uses a 3-character ID that contains digits, formatted with leading zeros. In my opinion, the better practice is a character-based ID.

 

To solve, let's use the Query Builder to create a lookup table.  Starting with the CLASSW data set, we'll create a two-column lookup.  One column is the numeric ID in CLASSW, and the second column is a character version (called ID_Char), formatted with the leading zeros.  I used put(t1.ID, Z3.) to create a Computed Column for character version.  Remember, the put() function always results in a character value.

 

computed_char.png

 

The result? A narrow data set with just the two columns:

lookup.png

 

If your data contains a lot of unique keys, you might be concerned that this new table can get really large.  But I think that you'll find that in most cases, the data set won't actually occupy much storage even if you have millions of records.  In my example case, the columns add up to only 8 bytes per record.

 

Now we can create another query in the Query Builder to join all three of these tables: the two source tables plus the lookup table.  This 3-way inner join uses the lookup table as the go-between.  Notice the INNER JOIN conditions listed in the Join Order section.

 

threeway.png

 

We'll select the fields we want to keep from each table.  We don't need to keep anything from the lookup table:

qselect.png

 

And here's our successful merged result:

result.png

 

What does this look like in the process flow?  Here's my example.  Note that the only "program" code is the bit I used to produce the messy data scenario in the first place.

 

pf.png

 

 

New Contributor
Posts: 4

Re: SAS-EG 6.1 useless for variable type changes?

[ Edited ]

Thanks Chris, this works well. But what if the two tables have different id's? Won't the inner joins miss those?

 

Also, lets say I do a full outer join for tables that have columns of the same name. Joined by ID.

 

ID          TYPE

001        Shirt

 

ID          TYPE

002        PANTS

 

My new set will have three columns. ID      t1.TYPE     t2.TYPE

I just want the one column Type.

 

any ideas?

Community Manager
Posts: 2,891

Re: SAS-EG 6.1 useless for variable type changes?

[ Edited ]

Check the Data->Append Table task to see if it does what you want.  Append Table uses SQL OUTER UNION CORR to do its work.

 

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 937 views
  • 3 likes
  • 3 in conversation