I played a bit more with this, in a utf-8 session with the string str="€1". In a UTF-8 session the euro symbol gets 3 bytes, so SAS makes str $4. Three bytes for the euro symbol, one byte for the 1.
If I write it out to .xlsx and then read it in with the EXCEL engine, it comes in as $6. So the EXCEL engine is giving each character 3 bytes, regardless of whether or not it needs it.
If I read it with the XLSX engine it comes in as $4. So apparently the XLSX engine is either assessing each character to figure out how many bytes it will need, or it is using some other metadata in the xlsx file to determine the number of bytes that SAS will need to store the data.
Test code like:
%let test=Q:\Junk ;
%put &=sysencoding ;
data euro ;
str="€1" ; *Euro symbol takes 3 bytes, so in utf-8 this string is 4 bytes long;
put str= $hex12. ;
run ;
proc contents data=euro ;
run ;
proc export data=euro
outfile="&test./test.xlsx"
dbms=xlsx
replace;
sheet=euro;
run;
*XLSX engine reads str as $4 . It must look at each character and assess how many bytes it needs, or use excel metadata to determine length;
libname xlsx xlsx "&test./test.xlsx";
proc contents data=xlsx.euro varnum;
run;
data _null_ ;
set xlsx.euro ;
put str= $hex12. ;
run ;
libname xlsx clear ;
*Excel engine reads str as $6, because it gives 3 bytes per character ;
libname excel excel "&test./test.xlsx";
proc contents data=excel.euro varnum;
run;
data _null_ ;
set excel.euro ;
put str= $hex12. ;
run ;
libname excel clear ;
... View more