Hi,
When importing an Excel file, the excel engine multiplies by 3 the size of character variables. If we attempt to reduce the length using dbsastype= option, values get truncated.
Do you know how to avoid id or at least how to reduce the length afterwards without altering the data ?
proc export data=sashelp.class
outfile="&test./reporting/test.xlsx"
dbms=xlsx
replace;
sheet=class;
run;
libname demo xlsx "&test./reporting/test.xlsx";
*libname demo excel "&test./reporting/test.xlsx";
ods select position;
proc contents data=demo.class varnum;
run;
proc print data=demo.class;
run;
libname demo clear;
It is ok with xlsx engine (see below). I'm just trying to understand excel engine here.
The Excel engine uses a module provided by Microsoft, so don't be surprised there (whenever you're involved with stuff coming from Redmond, WA, don't be surprised). It probably suspects the existence of UTF characters and makes room for them.
The XLSX engine is created by following the documentation of the .xlsx file format, and programmed by the SAS people.
The Excel engine uses a module provided by Microsoft, so don't be surprised there (whenever you're involved with stuff coming from Redmond, WA, don't be surprised). It probably suspects the existence of UTF characters and makes room for them.
The XLSX engine is created by following the documentation of the .xlsx file format, and programmed by the SAS people.
Just to make it clear: since the introduction of the XLSX engine, I wouldn't touch Excel or XLS with a ten-foot pole. Which I couldn't touch anyway, since "my" SAS ran on UNIX.
@Kurt_Bremser Yes I agree. There are just a few situations where excel dbms/engine is able to do things xlsx dbms/engine is not able to do like appending data in an existing dataset available in a given Excel worksheet.
A reliable fix for this is to save your Excel spreadsheet as a CSV then change your PROC IMPORT to read the CSV.
What is the encoding of your sas session?
%put &=sysencoding ;
I can replicate your results if I use a multi-byte encoding such as UTF-8, but if I use a single-byte encoding such as WLATIN1 I get a length of 1 from the EXCEL engine.
If your SAS session is multi-byte, I think the EXCEL engine is actually doing the right thing, it will allow you to import multi-byte character values that might appear in your excel file. I suppose if you don't like this, you could always write a macro that would look up the length of each character variable, then use a DATA step with a LENGTH statement to set the lengths to 1/3 of the imported length.
Yes, I'm using UTF-8.
What was surprising me is that if I reduce the length in a data step to 10, the value with 10 caracters gets truncated.
I've tested it with sashelp.class which has no multibytes characters tho.
@Ksharp wrote:
But you are using UNICODE SAS.
Any character (even a English character) in UNICODE encoding would have stored length 3.
Not true. All printable 7-bit ASCII characters use only one byte in UTF-8. That is one of the advantages of it over double byte encodings (commonly used for Chinese and Japanese) that always use two bytes for every character.
Other UTF-8 characters require 2,3 or 4 bytes.
@xxformat_com wrote:
I've tested it with sashelp.class which has no multibytes characters tho.
Can you show the code and log you used to test?
When I test with sashelp.class using the EXCEL engine to import, it imports NAME with a length of $21, but if I then use a DATA step to cut it back to a length of $7, nothing is truncated. There is a warning in the log about possible truncation, but that warning can be turned off by setting options varlenchk=nowarn.
Code:
%let test=Q:\Junk ;
%put &=sysencoding ;
proc export data=sashelp.class
outfile="&test./test.xlsx"
dbms=xlsx
replace;
sheet=class;
run;
libname demo excel "&test./test.xlsx";
proc contents data=demo.class varnum;
run;
***options varlenchk=nowarn ;
data class ;
length name $7 ;
set demo.class ;
put name=;
run ;
proc contents data=class ;
run ;
libname demo clear ;
log:
1 %let test=Q:\Junk ; 2 3 %put &=sysencoding ; SYSENCODING=utf-8 4 5 proc export data=sashelp.class 6 outfile="&test./test.xlsx" 7 dbms=xlsx 8 replace; 9 sheet=class; 10 run; NOTE: The export data set has 19 observations and 5 variables. NOTE: "Q:\Junk\test.xlsx" file was successfully created. NOTE: PROCEDURE EXPORT used (Total process time): real time 0.11 seconds cpu time 0.01 seconds 11 12 libname demo excel "&test./test.xlsx"; NOTE: Libref DEMO was successfully assigned as follows: Engine: EXCEL Physical Name: Q:\Junk/test.xlsx 13 14 proc contents data=demo.class varnum; 15 run; NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 16 17 ***options varlenchk=nowarn ; 18 data class ; 19 length name $7 ; 20 set demo.class ; 21 22 put name=; 23 run ; WARNING: Multiple lengths were specified for the variable name by input data set(s). This can cause truncation of data. name=Alfred name=Alice name=Barbara name=Carol name=Henry name=James name=Jane name=Janet name=Jeffrey name=John name=Joyce name=Judy name=Louise name=Mary name=Philip name=Robert name=Ronald name=Thomas name=William NOTE: There were 19 observations read from the data set DEMO.class. NOTE: The data set WORK.CLASS has 19 observations and 5 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 24 25 proc contents data=class ; 26 run ; NOTE: PROCEDURE CONTENTS used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
I played a bit more with this, in a utf-8 session with the string str="€1". In a UTF-8 session the euro symbol gets 3 bytes, so SAS makes str $4. Three bytes for the euro symbol, one byte for the 1.
If I write it out to .xlsx and then read it in with the EXCEL engine, it comes in as $6. So the EXCEL engine is giving each character 3 bytes, regardless of whether or not it needs it.
If I read it with the XLSX engine it comes in as $4. So apparently the XLSX engine is either assessing each character to figure out how many bytes it will need, or it is using some other metadata in the xlsx file to determine the number of bytes that SAS will need to store the data.
Test code like:
%let test=Q:\Junk ;
%put &=sysencoding ;
data euro ;
str="€1" ; *Euro symbol takes 3 bytes, so in utf-8 this string is 4 bytes long;
put str= $hex12. ;
run ;
proc contents data=euro ;
run ;
proc export data=euro
outfile="&test./test.xlsx"
dbms=xlsx
replace;
sheet=euro;
run;
*XLSX engine reads str as $4 . It must look at each character and assess how many bytes it needs, or use excel metadata to determine length;
libname xlsx xlsx "&test./test.xlsx";
proc contents data=xlsx.euro varnum;
run;
data _null_ ;
set xlsx.euro ;
put str= $hex12. ;
run ;
libname xlsx clear ;
*Excel engine reads str as $6, because it gives 3 bytes per character ;
libname excel excel "&test./test.xlsx";
proc contents data=excel.euro varnum;
run;
data _null_ ;
set excel.euro ;
put str= $hex12. ;
run ;
libname excel clear ;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.