I have two separate Excel files (Data1.xlsx and Data2.xlsx) that both have fields ID and last_name. I am exporting both of those data sets to separate permanent SAS datasets, needing the length and format of each field to be exported the same. In Data1.xlsx, the longest last_name I have is "Gooligan" and in Data2.xlsx, the longest last_name I have is "Van Driverson." I am able to export them, but when I go to append the 2 SAS datasets, the last_name in the second dataset gets truncated to 8 characters (since Gooligan is the max of 8 characters in first dataset.) How do I expand last_name in Data1.xlsx to 13 characters prior to exporting it to a permanent SAS dataset so that the files will append without truncation?
I realize I can append the files initially in Excel, but the SAS datasets need to be used later on for other reasons and need them to be the same structure. Thanks!
@Bluekeys49 wrote:
I have two separate Excel files (Data1.xlsx and Data2.xlsx) that both have fields ID and last_name. I am exporting both of those data sets to separate permanent SAS datasets, needing the length and format of each field to be exported the same. In Data1.xlsx, the longest last_name I have is "Gooligan" and in Data2.xlsx, the longest last_name I have is "Van Driverson." I am able to export them, but when I go to append the 2 SAS datasets, the last_name in the second dataset gets truncated to 8 characters (since Gooligan is the max of 8 characters in first dataset.) How do I expand last_name in Data1.xlsx to 13 characters prior to exporting it to a permanent SAS dataset so that the files will append without truncation?
I realize I can append the files initially in Excel, but the SAS datasets need to be used later on for other reasons and need them to be the same structure. Thanks!
To get consistent lengths in the first step data sets about your only shot starting with Excel is to save the file as CSV (or other delimited text) and use a data step to READ the data into SAS with a data step.
You can combine, with a limitation that all the like named variables have the same data type (not a given with most methods for getting Excel to SAS data sets) but setting a common property before using as data step code with set statements to append the data.
Data example; length lastname $ 25 firstname $15 commenttext $ 200; set datasetone datasettwo ; run;
Since the Length statement appears before the Set to read the files that becomes the length of the variables. So if datasetone has maximum length of 8 for lastname and datasettwo has maximum length of 25 the 25 is used for combining the values.
@Bluekeys49 wrote:
I have two separate Excel files (Data1.xlsx and Data2.xlsx) that both have fields ID and last_name. I am exporting both of those data sets to separate permanent SAS datasets, needing the length and format of each field to be exported the same. In Data1.xlsx, the longest last_name I have is "Gooligan" and in Data2.xlsx, the longest last_name I have is "Van Driverson." I am able to export them, but when I go to append the 2 SAS datasets, the last_name in the second dataset gets truncated to 8 characters (since Gooligan is the max of 8 characters in first dataset.) How do I expand last_name in Data1.xlsx to 13 characters prior to exporting it to a permanent SAS dataset so that the files will append without truncation?
I realize I can append the files initially in Excel, but the SAS datasets need to be used later on for other reasons and need them to be the same structure. Thanks!
To get consistent lengths in the first step data sets about your only shot starting with Excel is to save the file as CSV (or other delimited text) and use a data step to READ the data into SAS with a data step.
You can combine, with a limitation that all the like named variables have the same data type (not a given with most methods for getting Excel to SAS data sets) but setting a common property before using as data step code with set statements to append the data.
Data example; length lastname $ 25 firstname $15 commenttext $ 200; set datasetone datasettwo ; run;
Since the Length statement appears before the Set to read the files that becomes the length of the variables. So if datasetone has maximum length of 8 for lastname and datasettwo has maximum length of 25 the 25 is used for combining the values.
If you make an empty dataset with the desired attributes (i.e. length), then you can use it to guide the length of appending.
data dummy_data;
length last_name $30 first_name $20;
stop;
run;
data mydata;
set dummy_data data1 data2;
run;
In the "data mydata" step, the length of a variable will be determined by its first reference - in this case dummy_data (with zero observations but also with metadata for each variable), which had lengths assigned in the prior step. The advantage of this is that you can make dummy_data a permanent data set, but can subsequently use it to process any update files, (i.e. run the second step only) with no further specific length assignments. The more variables you have to deal with, the more beneficial this technique will be.
This behavior means that
data mydata;
set data1 data2;
run;
will assign lengths determined by DATA1, as you have discovered.
Of course, you could switch the order of the data set names in the SET statement (set data2 data1). But if you really want all the data1 cases preceding all the data2 cases, you could also
data mydata;
if 0 then set data2;
set data1 data2;
run;
which forces the compiler to encounter the lengths in data2 first even though the data in data2 follows data1. And of course there is a risk that data2 doesn't always have the longer lengths.
A long-winded way to suggest the dummy_data approach for any work that will be repeated.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.