BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Bluekeys49
Obsidian | Level 7

I have two separate Excel files (Data1.xlsx and Data2.xlsx) that both have fields ID and last_name.  I am exporting both of those data sets to separate permanent SAS datasets, needing the length and format of each field to be exported the same.  In Data1.xlsx, the longest last_name I have is "Gooligan" and in Data2.xlsx, the longest last_name I have is "Van Driverson." I am able to export them, but when I go to append the 2 SAS datasets, the last_name in the second dataset gets truncated to 8 characters (since Gooligan is the max of 8 characters in first dataset.)  How do I expand last_name in Data1.xlsx to 13 characters prior to exporting it to a permanent SAS dataset so that the files will append without truncation?

 

I realize I can append the files initially in Excel, but the SAS datasets need to be used later on for other reasons and need them to be the same structure.  Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@Bluekeys49 wrote:

I have two separate Excel files (Data1.xlsx and Data2.xlsx) that both have fields ID and last_name.  I am exporting both of those data sets to separate permanent SAS datasets, needing the length and format of each field to be exported the same.  In Data1.xlsx, the longest last_name I have is "Gooligan" and in Data2.xlsx, the longest last_name I have is "Van Driverson." I am able to export them, but when I go to append the 2 SAS datasets, the last_name in the second dataset gets truncated to 8 characters (since Gooligan is the max of 8 characters in first dataset.)  How do I expand last_name in Data1.xlsx to 13 characters prior to exporting it to a permanent SAS dataset so that the files will append without truncation?

 

I realize I can append the files initially in Excel, but the SAS datasets need to be used later on for other reasons and need them to be the same structure.  Thanks!


To get consistent lengths in the first step data sets about your only shot starting with Excel is to save the file as CSV (or other delimited text) and use a data step to READ the data into SAS with a data step.

You can combine, with a limitation that all the like named variables have the same data type (not a given with most methods for getting Excel to SAS data sets) but setting a common property before using as data step code with set statements to append the data.

Data example;
   length lastname $ 25  firstname $15 commenttext $ 200;
   set datasetone
       datasettwo
   ;
run;

Since the Length statement appears before the Set to read the files that becomes the length of the variables. So if datasetone has maximum length of 8 for lastname and datasettwo has maximum length of 25 the 25 is used for combining the values.

 

 

View solution in original post

2 REPLIES 2
ballardw
Super User

@Bluekeys49 wrote:

I have two separate Excel files (Data1.xlsx and Data2.xlsx) that both have fields ID and last_name.  I am exporting both of those data sets to separate permanent SAS datasets, needing the length and format of each field to be exported the same.  In Data1.xlsx, the longest last_name I have is "Gooligan" and in Data2.xlsx, the longest last_name I have is "Van Driverson." I am able to export them, but when I go to append the 2 SAS datasets, the last_name in the second dataset gets truncated to 8 characters (since Gooligan is the max of 8 characters in first dataset.)  How do I expand last_name in Data1.xlsx to 13 characters prior to exporting it to a permanent SAS dataset so that the files will append without truncation?

 

I realize I can append the files initially in Excel, but the SAS datasets need to be used later on for other reasons and need them to be the same structure.  Thanks!


To get consistent lengths in the first step data sets about your only shot starting with Excel is to save the file as CSV (or other delimited text) and use a data step to READ the data into SAS with a data step.

You can combine, with a limitation that all the like named variables have the same data type (not a given with most methods for getting Excel to SAS data sets) but setting a common property before using as data step code with set statements to append the data.

Data example;
   length lastname $ 25  firstname $15 commenttext $ 200;
   set datasetone
       datasettwo
   ;
run;

Since the Length statement appears before the Set to read the files that becomes the length of the variables. So if datasetone has maximum length of 8 for lastname and datasettwo has maximum length of 25 the 25 is used for combining the values.

 

 

mkeintz
PROC Star

If you make an empty dataset with the desired attributes (i.e. length), then you can use it to guide the length of appending.

 

data dummy_data;
  length last_name $30 first_name $20;
  stop;
run;
data mydata;
  set dummy_data data1 data2;
run;

In the "data mydata" step, the length of a variable will be determined by its first reference - in this case dummy_data (with zero observations but also with metadata for each variable), which had lengths assigned in the prior step.  The advantage of this is that you can make dummy_data a permanent data set, but can subsequently use it to process any update files, (i.e. run the second step only) with no further specific length assignments.  The more variables you have to deal with, the more beneficial this technique will be.

 

This behavior means that

data mydata;
  set data1 data2;
run;

will assign lengths determined by DATA1, as you have discovered.

 

Of course, you could switch the order of the data set names in the SET statement (set data2 data1).  But if you really want all the data1 cases preceding all the data2 cases, you could also

data mydata;
  if 0 then set data2;
  set data1 data2;
run;

which forces the compiler to encounter the lengths in data2 first even though the data in data2 follows data1. And of course there is a risk that data2 doesn't always have the longer lengths.

 

A long-winded way to suggest the dummy_data approach for any work that will be repeated.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 399 views
  • 2 likes
  • 3 in conversation