BookmarkSubscribeRSS Feed
icecube
Calcite | Level 5

Hello everyone!

This is my very first post (after reading actively on the forum for a while now) since I couldn't find a similar issue on this board already.

 

I am testing a piece of code on different servers, one running 9.3 and the other one 9.4, and there is a different behavior when importing XLSX files, in particular when dealing with character variables. In short, the import procedure in SAS 9.3 will assign a minimum length of 3 times the observed length to a variable containing strings, whereas the SAS 9.4 version only uses a minimum length needed to fit the observed values into the variable. I attached a minimum example of the code that reproduces this behavior on the SAS servers I am running on.

 

proc import out = work.long_id datafile = "PATH_TO_THE_EXCEL\example_long_short_id.xlsx"
    DBMS=xlsx  REPLACE;
    SHEET="Long ID"; 
    GETNAMES=YES;
run;

proc import out = work.short_id datafile = "PATH_TO_THE_EXCEL\example_long_short_id.xlsx"
    DBMS=xlsx  REPLACE;
    SHEET="Short ID"; 
    GETNAMES=YES;
run;

/* Get the information about the length of the variables */
title "Observations with short id";
proc contents data = work.short_id;
run;
title "Observations with long ID";
proc contents data = work.long_id;
run;

/* Concatenate data */
data work.concatenate;
    set work.short_id work.long_id;
run;

title "Concatenated table (short id table comes first, long id is truncated under SAS 9.4)";
proc contents data = work.concatenate;
run;
proc print data = work.concatenate;
run;

 

Screenshots of the Excel input are attached, they are very simple.

First Excel SheetFirst Excel SheetSecond Excel SheetSecond Excel Sheet

 

The "short id" Excel-Sheet has variables with length of 9 and under SAS 9.3 and 9.4 it is read in with length 27 and 9, respectively. For the "long id" the length is 13 and it is imported with length of 39 (SAS 9.3) or 13 (SAS 9.4). As stated above, the SAS 9.3 server seems to choose three times the length of the maximum length it observed in the first X rows, whereas SAS 9.4 chooses only the maximum length.

 

Alhtough I can reproduce and identify this difference in behaviour, I don't know how to adjust this behaviour in the settings, because the behavior of SAS 9.3 is the one needed. This brings me to my question(s):

  1. Which settings of the SAS 9.4 server need to be adjusted so that the length assigned to a column when importing an xlsx is three times the length of the variables observed?
  2. Where can I find the documentation that the settings were changed when switching from SAS 9.3 to SAS 9.4?

I attached some screenshots of the output as well, please excuse the German output of the server 😉 It's the proc contents output of the sheet with long IDs. You can see the different lengths assigned by SAS when running on different servers.

 

Proc Contents Output under SAS 9.3Proc Contents Output under SAS 9.3Proc Contents Output under SAS 9.4Proc Contents Output under SAS 9.4

 

Thanks a lot, I hope I adhered to the rules of posting and look forward to your suggestions.

 

Cheers,

icecube

5 REPLIES 5
Kurt_Bremser
Super User

As long as you use Excel files, DO NEVER expect consistent results. NEVER.

Not across Windows versions, Office versions, SAS versions, not even across days.

If you want consistency, use csv files or similar and read with a data step where you have control over the process.

Reading Excel files involves guessing on the side of proc import (or libname excel), and if the change is as you report, I even consider it to be on the good side, as it is designed to avoid waste of space.

 

You can take from the tone of my answer how dismal my experiences with using Excel files have been. I avoid them like the plague.

 

 

icecube
Calcite | Level 5

Hello Kurt! I share your opinion and strongly try to avoid it going forward and use csv files etc to be safe. Anyways, having dependencies on older code as well I still want to make the differences when switching SAS versions as small as possible. You don't happen to know a setting in SAS that takes the longest observed length in the first X rows and multiplies it by three?

 

Thanks,

icecube

Kurt_Bremser
Super User

I know of no such setting. Import is a procedure that does no really have lots of options like others, but is designed to be a quick-shot black box with as little user-interaction as possible.

 

IMO, before you invest lots of time trying to fix that issue, you're much better off converting the whole process to a csv-based operation. AFAIK, at least the open-source alternatives for Excel allow command-line driven file conversions, if you can't get the source to deliver data in text files in the first place.

Tom
Super User Tom
Super User

@icecube wrote:

Hello Kurt! I share your opinion and strongly try to avoid it going forward and use csv files etc to be safe. Anyways, having dependencies on older code as well I still want to make the differences when switching SAS versions as small as possible. You don't happen to know a setting in SAS that takes the longest observed length in the first X rows and multiplies it by three?

 

Thanks,

icecube


There is a setting for handling the transcoding of data from single byte encodings into UTF-8.  Since a single byte could be transcoded into a 3 byte sequence.  The option basically says what factor to use to increase the length of character variables to avoid loss of data.  Is that what is happening in your case?

icecube
Calcite | Level 5

Hello Tom!

This sounds very promising since I see that for SAS 9.3 the length is always three times the longest entry (by number of characters). Where can I find the setting that handles the transcoding? This way, I could check whether this is configured differently for the two SAS versions I am using now.

 

Best

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 687 views
  • 3 likes
  • 3 in conversation