BookmarkSubscribeRSS Feed
NewUsrStat
Lapis Lazuli | Level 10

Dear all, 

I'm experiencing issues while reading an Excel file in input with PROC IMPORT. 

Sas converts character variables into numeric ones with al lot of missing values. 

 

I tried different ways to overcome this issue but without success, like: to set to UTF-9 the encoding, to specify the variable that is character (although there are many variables to specify). 

 

proc import out= work.myw replace
  dbms=xlsx  datafile= "...\myfile.xlsx" ;
run;

DBDSOPTS= "DBTYPE=(varx='CHAR(3)')";

 

Can anyone help me please? 

 

Thank you in advance 

 

Best

7 REPLIES 7
Tom
Super User Tom
Super User

SAS does not convert columns in an Excel spreadsheet into numeric variables unless there is at least one numeric cell in the column and no character cells.  If the column has any character cells then it will defined as character. This is because you can always make a reasonable representation of a number as a character string, but the reverse is not possible.  For example, what number would you assign for a cell with the value "NY TIMES"?  Note also that it will define an empty column as a character variable with a storage length of 1 (probably because it will take less space than defining it as numeric).

 

Perhaps your issue is with whatever process created the Excel spreadsheet.  That is probably where the the cells were made as numeric instead of character.  For example if you let Excel open a CSV file without checking what it does it will convert strings that only contain digits into numbers.  So an id value like 123 , or worse one with leading zeros, will be converted into a number.

NewUsrStat
Lapis Lazuli | Level 10
Thank you very much for your explanation. I tried converting the Excel file into a CSV, but some variable names became "VAR*". I have to say that the variable names contain special characters.
Kathryn_SAS
SAS Employee

If you have variable names with special characters, try adding the following option before importing the CSV file:

options validvarname=any;

If you have additional questions, please include the code you are currently using.

Kurt_Bremser
Super User

@NewUsrStat wrote:
Thank you very much for your explanation. I tried converting the Excel file into a CSV, but some variable names became "VAR*". I have to say that the variable names contain special characters.

This does not matter. In the DATA step you write, you will skip the header with FIRSTOBS=2 in the INFILE statement, and name the variables in the INPUT statement yourself.

Tom
Super User Tom
Super User

@NewUsrStat wrote:
Thank you very much for your explanation. I tried converting the Excel file into a CSV, but some variable names became "VAR*". I have to say that the variable names contain special characters.

If the columns headers in a CSV file are not valid SAS names then how PROC IMPORT handles them depends on the setting of the VALIDVARNAME= system option.  With valid varname=V7 it will generally replace the strange characters with underscores.  With VALIDVARNAME=ANY it will use them and then in your SAS code you will need to use names literal (values like 'My var'n ) to reference the variables.

 

But the problem of names becoming just VAR is usually caused by having more fields in the data lines than were in the header row. So you have data in column 5 but only headers for columns 1 to 4.  Or it could be that the header row was longer then 32K bytes as that is the limit that PROC IMPORT can support. Or it might be that the column headers are too long and are not unique in the first 32 bytes.

 

If you have that problem try using a different tool to guess how to read the CSV file. Such as this one:

https://github.com/sasutils/macros/blob/master/csv2ds.sas

Differences from PROC IMPORT
- Supports header lines with more than 32,767 characters
- Supports ZIP and GZIP source files
- Generates unique variable names by adding numeric suffix
- Does not overestimate maxlength when longest value is quoted
- Does NOT force character type if all values are quoted
- Generates label when generated variable name is different than header
- Supports NAMEROW option
- Supports numeric fields with special missing values (MISSING statement)
- Does not attach unneeded informats or formats
- Allows overriding calculated metadata
- Allow using random sample of rows to guess metadata
- Generates more compact SAS code
- Generates analysis summary dataset and raw data view
- Saves generated SAS code to a file
- Forces DATE and DATETIME formats to show century
- Difference in generated V7 compatible variable names
  - Replaces adjacent non-valid characters with single underscore

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 577 views
  • 1 like
  • 5 in conversation