BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi All,

 

When importing an Excel column, the XLS engine automatically handles a "mixed" column with some numeric values and some character values, by converting the numeric values to character strings.  Are the rules for this numeric -> character conversion documented somewhere?

 

Suppose I have an Excel file with a mixed column.  So first value is character 'foo'.  Second value is numeric 12345678901234.  I had hoped SAS might convert that numeric value to character string '12345678901234'.  In fact I get the string: '1.2345679e+013'  .   I had thought SAS might be applying a best format to convert the value to a string, but that doesn't look like a best.  It's closer to the E format, but the e is lowercase.  It's weird.

 

Interestingly, if I consider a slightly shorter number, 1234567890123, if the excel cell is type numeric, it is imported to SAS as I would like '1234567890123'.  But if the Excel cell is type general, it's imported as '1.234568e+012'.  

 

Attached is a sample Excel file, read with PROC IMPORT.  But if you don't want to download an Excel file, you can just test by typing 12345678901234 into a new excel file.

 

proc import file="Q:\BigNumber.xls" out=BigNumber dbms=xls replace ;
  getnames=yes ;
  sheet="Sheet1" ;
run;

 

Yes this problem could be avoided by users entering the data into text cells (rather than numeric or general), or saving to CSV, or using database.... But for those times that I'm forced to use Excel against my will, just trying to understand the rules used by the XLS engine.

 

Thx,

-Q.

 

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

I heard back from tech support.  They replicated the problem, and said they would file a bug report for the XLS engine.  I was not able to obtain any more information about the logic used by the XLS engine to decide how it will convert numeric values to character values. 

 

They pointed out that if you convert the .xls file to .xlsx and then use the XLSX engine, you do not have this problem.  The XLSX engine correctly imports all of the columns as $14:

 

ExcelPic2.PNG

 

 

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

6 REPLIES 6
Reeza
Super User

I believe there's an option within the xls IMPORT for MIXED types, and it goes along with the GUESSINGROWS option. If after scanning the number of records it's defined as MIXED it forces it to character, otherwise assigns it as numeric.

 

This is older but probably still applies to the XLS engine. I don't think it quite answers your full question but hope it helps.

https://support.sas.com/documentation/cdl/en/acpcref/63184/HTML/default/viewer.htm#a003103761.htm

 

I believe DBSASTYPE can be used to force the type, but you need to use a different method to import in that case. 

 

They really need to add an option to read everything as character and allow conversion after the fact for Excel files.


@Quentin wrote:

Hi All,

 

When importing an Excel column, the XLS engine automatically handles a "mixed" column with some numeric values and some character values, by converting the numeric values to character strings.  Are the rules for this numeric -> character conversion documented somewhere?

 

Suppose I have an Excel file with a mixed column.  So first value is character 'foo'.  Second value is numeric 12345678901234.  I had hoped SAS might convert that numeric value to character string '12345678901234'.  In fact I get the string: '1.2345679e+013'  .   I had thought SAS might be applying a best format to convert the value to a string, but that doesn't look like a best.  It's closer to the E format, but the e is lowercase.  It's weird.

 

Interestingly, if I consider a slightly shorter number, 1234567890123, if the excel cell is type numeric, it is imported to SAS as I would like '1234567890123'.  But if the Excel cell is type general, it's imported as '1.234568e+012'.  

 

Attached is a sample Excel file, read with PROC IMPORT.  But if you don't want to download an Excel file, you can just test by typing 12345678901234 into a new excel file.

 

proc import file="Q:\BigNumber.xls" out=BigNumber dbms=xls replace ;
  getnames=yes ;
  sheet="Sheet1" ;
run;

 

Yes this problem could be avoided by users entering the data into text cells (rather than numeric or general), or saving to CSV, or using database.... But for those times that I'm forced to use Excel against my will, just trying to understand the rules used by the XLS engine.

 

Thx,

-Q.

 

 


 

Quentin
Super User

Thanks @Reeza.

 

The great thing about the XLS and XLSX engines is that you don't need to use the MIXED option, they do it by default.  And they don't even rely on the TypeGuessRows registry key.  They always scan the entire column, and import as character if any values is text.  

 

They're documented at the end of:

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=acpcref&docsetTarget=n0ms...

The XLSX engine only supports three options: GETNAMES RANGE and SHEET.

 

In this case, the XLS engine is doing the right thing because it imports the column as character.  It's just that I don't like the way what it converts the numeric value 12345678901234 to '1.2345679e+013' rather than '12345678901234'.  To be fair, the EXCEL engine also does similar conversion to scientific notation when it does numeric to character conversion of big numbers (when you have the MIXED option).

 

Since I'm using 64 bit SAS reading 32 bit Excel file, I think my only hope for DBSASTYPE would be to switch to using the PCFILES engine, which I don't love.

 

Totally agree, a way to force columns to be read as character would be great.  

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Quentin
Super User
No, the XLS engine does not depend on the Microsoft ACE engine. The XLS and XLSX engines were both written by SAS. I’m using XLS instead of the older EXCEL engine because XLS doesn’t need to match bitness.

See e.g. https://support.sas.com/resources/papers/proceedings17/SAS0387-2017.pdf

So I’m hoping they’ll be able to share more of the internals.
BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Quentin
Super User

Well, day 2 and this feels more bizarre.  I'm going to send it to tech support and see what they say.

 

The problem goes away if I simply expand the Excel column widths.  I would never have thought that column widths would effect how data are imported.

 

My new test Excel file (attached) looks like:

ExcelPic.PNG

The columns NumWide and GeneralWide are copies of Num and General, but then the column widths are expanded.

 

After PROC IMPORT with with XLS engine (9.4M4, 64-bit PC SAS).  The output is:

Obs    Num                 General            Text                  NumWide                          GeneralWide

 1     foo                 foo                foo                   foo                              foo
 2     1.2345679e+013      1.234568e+013      12345678901234        12345678901234                   12345678901234
 3     1234567890123       1.234568e+012      1234567890123         1234567890123                    1234567890123
 4     123456789012        123456789012       123456789012          123456789012                     123456789012
 5     12345678901         12345678901        12345678901           12345678901                      12345678901
 6     1234567890          1234567890         1234567890            1234567890                       1234567890

PROC CONTENTS shows that the length (and informat and format) of NumWide and GenWide are longer than Num and General, just because of the column widths?!?

 

Variable       Type    Len    Format    Informat

Num            Char     16    $16.      $16.
General        Char     15    $15.      $15.
Text           Char     18    $18.      $18.
NumWide        Char     29    $29.      $29.
GeneralWide    Char     27    $27.      $27.

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
Quentin
Super User

I heard back from tech support.  They replicated the problem, and said they would file a bug report for the XLS engine.  I was not able to obtain any more information about the logic used by the XLS engine to decide how it will convert numeric values to character values. 

 

They pointed out that if you convert the .xls file to .xlsx and then use the XLSX engine, you do not have this problem.  The XLSX engine correctly imports all of the columns as $14:

 

ExcelPic2.PNG

 

 

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 5559 views
  • 0 likes
  • 3 in conversation