- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone!
This is my very first post (after reading actively on the forum for a while now) since I couldn't find a similar issue on this board already.
I am testing a piece of code on different servers, one running 9.3 and the other one 9.4, and there is a different behavior when importing XLSX files, in particular when dealing with character variables. In short, the import procedure in SAS 9.3 will assign a minimum length of 3 times the observed length to a variable containing strings, whereas the SAS 9.4 version only uses a minimum length needed to fit the observed values into the variable. I attached a minimum example of the code that reproduces this behavior on the SAS servers I am running on.
proc import out = work.long_id datafile = "PATH_TO_THE_EXCEL\example_long_short_id.xlsx" DBMS=xlsx REPLACE; SHEET="Long ID"; GETNAMES=YES; run; proc import out = work.short_id datafile = "PATH_TO_THE_EXCEL\example_long_short_id.xlsx" DBMS=xlsx REPLACE; SHEET="Short ID"; GETNAMES=YES; run; /* Get the information about the length of the variables */ title "Observations with short id"; proc contents data = work.short_id; run; title "Observations with long ID"; proc contents data = work.long_id; run; /* Concatenate data */ data work.concatenate; set work.short_id work.long_id; run; title "Concatenated table (short id table comes first, long id is truncated under SAS 9.4)"; proc contents data = work.concatenate; run; proc print data = work.concatenate; run;
Screenshots of the Excel input are attached, they are very simple.
First Excel Sheet
Second Excel Sheet
The "short id" Excel-Sheet has variables with length of 9 and under SAS 9.3 and 9.4 it is read in with length 27 and 9, respectively. For the "long id" the length is 13 and it is imported with length of 39 (SAS 9.3) or 13 (SAS 9.4). As stated above, the SAS 9.3 server seems to choose three times the length of the maximum length it observed in the first X rows, whereas SAS 9.4 chooses only the maximum length.
Alhtough I can reproduce and identify this difference in behaviour, I don't know how to adjust this behaviour in the settings, because the behavior of SAS 9.3 is the one needed. This brings me to my question(s):
- Which settings of the SAS 9.4 server need to be adjusted so that the length assigned to a column when importing an xlsx is three times the length of the variables observed?
- Where can I find the documentation that the settings were changed when switching from SAS 9.3 to SAS 9.4?
I attached some screenshots of the output as well, please excuse the German output of the server 😉 It's the proc contents output of the sheet with long IDs. You can see the different lengths assigned by SAS when running on different servers.
Proc Contents Output under SAS 9.3
Proc Contents Output under SAS 9.4
Thanks a lot, I hope I adhered to the rules of posting and look forward to your suggestions.
Cheers,
icecube
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As long as you use Excel files, DO NEVER expect consistent results. NEVER.
Not across Windows versions, Office versions, SAS versions, not even across days.
If you want consistency, use csv files or similar and read with a data step where you have control over the process.
Reading Excel files involves guessing on the side of proc import (or libname excel), and if the change is as you report, I even consider it to be on the good side, as it is designed to avoid waste of space.
You can take from the tone of my answer how dismal my experiences with using Excel files have been. I avoid them like the plague.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello Kurt! I share your opinion and strongly try to avoid it going forward and use csv files etc to be safe. Anyways, having dependencies on older code as well I still want to make the differences when switching SAS versions as small as possible. You don't happen to know a setting in SAS that takes the longest observed length in the first X rows and multiplies it by three?
Thanks,
icecube
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I know of no such setting. Import is a procedure that does no really have lots of options like others, but is designed to be a quick-shot black box with as little user-interaction as possible.
IMO, before you invest lots of time trying to fix that issue, you're much better off converting the whole process to a csv-based operation. AFAIK, at least the open-source alternatives for Excel allow command-line driven file conversions, if you can't get the source to deliver data in text files in the first place.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@icecube wrote:
Hello Kurt! I share your opinion and strongly try to avoid it going forward and use csv files etc to be safe. Anyways, having dependencies on older code as well I still want to make the differences when switching SAS versions as small as possible. You don't happen to know a setting in SAS that takes the longest observed length in the first X rows and multiplies it by three?
Thanks,
icecube
There is a setting for handling the transcoding of data from single byte encodings into UTF-8. Since a single byte could be transcoded into a 3 byte sequence. The option basically says what factor to use to increase the length of character variables to avoid loss of data. Is that what is happening in your case?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello Tom!
This sounds very promising since I see that for SAS 9.3 the length is always three times the longest entry (by number of characters). Where can I find the setting that handles the transcoding? This way, I could check whether this is configured differently for the two SAS versions I am using now.
Best