Hi All,
When importing an Excel column, the XLS engine automatically handles a "mixed" column with some numeric values and some character values, by converting the numeric values to character strings. Are the rules for this numeric -> character conversion documented somewhere?
Suppose I have an Excel file with a mixed column. So first value is character 'foo'. Second value is numeric 12345678901234. I had hoped SAS might convert that numeric value to character string '12345678901234'. In fact I get the string: '1.2345679e+013' . I had thought SAS might be applying a best format to convert the value to a string, but that doesn't look like a best. It's closer to the E format, but the e is lowercase. It's weird.
Interestingly, if I consider a slightly shorter number, 1234567890123, if the excel cell is type numeric, it is imported to SAS as I would like '1234567890123'. But if the Excel cell is type general, it's imported as '1.234568e+012'.
Attached is a sample Excel file, read with PROC IMPORT. But if you don't want to download an Excel file, you can just test by typing 12345678901234 into a new excel file.
proc import file="Q:\BigNumber.xls" out=BigNumber dbms=xls replace ;
getnames=yes ;
sheet="Sheet1" ;
run;
Yes this problem could be avoided by users entering the data into text cells (rather than numeric or general), or saving to CSV, or using database.... But for those times that I'm forced to use Excel against my will, just trying to understand the rules used by the XLS engine.
Thx,
-Q.
I heard back from tech support. They replicated the problem, and said they would file a bug report for the XLS engine. I was not able to obtain any more information about the logic used by the XLS engine to decide how it will convert numeric values to character values.
They pointed out that if you convert the .xls file to .xlsx and then use the XLSX engine, you do not have this problem. The XLSX engine correctly imports all of the columns as $14:
I believe there's an option within the xls IMPORT for MIXED types, and it goes along with the GUESSINGROWS option. If after scanning the number of records it's defined as MIXED it forces it to character, otherwise assigns it as numeric.
This is older but probably still applies to the XLS engine. I don't think it quite answers your full question but hope it helps.
https://support.sas.com/documentation/cdl/en/acpcref/63184/HTML/default/viewer.htm#a003103761.htm
I believe DBSASTYPE can be used to force the type, but you need to use a different method to import in that case.
They really need to add an option to read everything as character and allow conversion after the fact for Excel files.
@Quentin wrote:
Hi All,
When importing an Excel column, the XLS engine automatically handles a "mixed" column with some numeric values and some character values, by converting the numeric values to character strings. Are the rules for this numeric -> character conversion documented somewhere?
Suppose I have an Excel file with a mixed column. So first value is character 'foo'. Second value is numeric 12345678901234. I had hoped SAS might convert that numeric value to character string '12345678901234'. In fact I get the string: '1.2345679e+013' . I had thought SAS might be applying a best format to convert the value to a string, but that doesn't look like a best. It's closer to the E format, but the e is lowercase. It's weird.
Interestingly, if I consider a slightly shorter number, 1234567890123, if the excel cell is type numeric, it is imported to SAS as I would like '1234567890123'. But if the Excel cell is type general, it's imported as '1.234568e+012'.
Attached is a sample Excel file, read with PROC IMPORT. But if you don't want to download an Excel file, you can just test by typing 12345678901234 into a new excel file.
proc import file="Q:\BigNumber.xls" out=BigNumber dbms=xls replace ; getnames=yes ; sheet="Sheet1" ; run;
Yes this problem could be avoided by users entering the data into text cells (rather than numeric or general), or saving to CSV, or using database.... But for those times that I'm forced to use Excel against my will, just trying to understand the rules used by the XLS engine.
Thx,
-Q.
Thanks @Reeza.
The great thing about the XLS and XLSX engines is that you don't need to use the MIXED option, they do it by default. And they don't even rely on the TypeGuessRows registry key. They always scan the entire column, and import as character if any values is text.
They're documented at the end of:
The XLSX engine only supports three options: GETNAMES RANGE and SHEET.
In this case, the XLS engine is doing the right thing because it imports the column as character. It's just that I don't like the way what it converts the numeric value 12345678901234 to '1.2345679e+013' rather than '12345678901234'. To be fair, the EXCEL engine also does similar conversion to scientific notation when it does numeric to character conversion of big numbers (when you have the MIXED option).
Since I'm using 64 bit SAS reading 32 bit Excel file, I think my only hope for DBSASTYPE would be to switch to using the PCFILES engine, which I don't love.
Totally agree, a way to force columns to be read as character would be great.
The xls engine depends on a Microsoft-supplied module. (Legally) freely available documentation for that might be hard to find.
Well, day 2 and this feels more bizarre. I'm going to send it to tech support and see what they say.
The problem goes away if I simply expand the Excel column widths. I would never have thought that column widths would effect how data are imported.
My new test Excel file (attached) looks like:
The columns NumWide and GeneralWide are copies of Num and General, but then the column widths are expanded.
After PROC IMPORT with with XLS engine (9.4M4, 64-bit PC SAS). The output is:
Obs Num General Text NumWide GeneralWide 1 foo foo foo foo foo 2 1.2345679e+013 1.234568e+013 12345678901234 12345678901234 12345678901234 3 1234567890123 1.234568e+012 1234567890123 1234567890123 1234567890123 4 123456789012 123456789012 123456789012 123456789012 123456789012 5 12345678901 12345678901 12345678901 12345678901 12345678901 6 1234567890 1234567890 1234567890 1234567890 1234567890
PROC CONTENTS shows that the length (and informat and format) of NumWide and GenWide are longer than Num and General, just because of the column widths?!?
Variable Type Len Format Informat Num Char 16 $16. $16. General Char 15 $15. $15. Text Char 18 $18. $18. NumWide Char 29 $29. $29. GeneralWide Char 27 $27. $27.
I heard back from tech support. They replicated the problem, and said they would file a bug report for the XLS engine. I was not able to obtain any more information about the logic used by the XLS engine to decide how it will convert numeric values to character values.
They pointed out that if you convert the .xls file to .xlsx and then use the XLSX engine, you do not have this problem. The XLSX engine correctly imports all of the columns as $14:
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.