BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
JuLePe
Calcite | Level 5

I'm trying to import Excel files where users have used various missing value codes for numerical variables. Sometimes they use blank, sometimes ".", sometimes "N/A", etc. A small example: The excel file data may be like this:

Var1Var2Var3Var4
1401251017
....
143.04126.7213.02932.817
    
N/AN/AN/AN/A
120150.8791110

 

I want to import Var1-Var as numeric into a SAS data step, assigning a missing value to anything that is not a number. I could of course recode the Excel file, but there are too many users that code missing values in a different way; I want the import procedure to be foolproof.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If you have to read from a XLSX file then you do not have any control.  If SAS is running on a Windows machine and has a compatible version of Excel available you might be able to have more control using the DBMS=EXCEL option of PROC IMPORT or the EXCEL libref engine since that might support things like the DBSASTYPE= dataset option.

 

If the EXCEL column has at least one character string in any cell SAS must make a character variable.  In general you can represent any number as a string, but cannot do the reverse. This can be a problem if you want to use the same process to read multiple worksheets that you want to treat as having the same variables.  Since based on the content in the individual files you might sometimes get VAR1 as numeric and other times character. In that case you might tell PROC IMPORT to treat the header row as data.  That should force every column to be character.  That way you could have a standard process to use that imports the file and then runs some more code to convert the strings into numbers without having to worry about which variables need to be converted.

 

Let's run an example.

First let's make an XLSX file so we have something to program with.

Spoiler
data have;
  length x1-x4 $20. ;
  input x1-x4 ;
cards;
Var1 Var2 Var3 Var4
140 125 10 17
. . . .
143.04 126.72 13.029 32.817
missing unknown what help       
N/A N/A N/A N/A
120 150.879 11 10
;

filename xlsx temp;
proc export data=have dbms=xlsx file=xlsx replace;
  putnames=no;
run;

Now let's make a program that reads in the XLSX file treating the header as part of the data.  Uses PROC TRANSPOSE to generate a dataset with the names of the character variables.  Uses those names to generate code to make a character variable.  Then run a data step that reads in the data (skipping the headers) and executes the generated code.

proc import dbms=xlsx file=xlsx out=step1 replace;
  getnames=no;
run;

proc transpose data=step1(obs=1) out=names;
  var _character_;
run;

filename code temp;
data _null_;
  set names ;
  file code;
  put col1 '=input(' _name_ ',??32.);' 
    / 'drop ' _name_ ';'
  ;
run;

data want;
  set step1(firstobs=2);
%include code / source2;
run;

Results:

Tom_0-1731695413610.png

Tom_1-1731695436488.png

 

 

 

 

View solution in original post

4 REPLIES 4
ballardw
Super User

You will either need to use a different way to read the data, such as save the Excel to CSV and write a data step to read with custom informats or use a data step to create a bunch of new variables using programming to set the "missing" for the new variables that can be numeric.

From Excel or other spreadsheet do a File-Save As-CSV (pick one of the CSV such as unicode or plain CSV).

then something like:

data example;
    infile "<path>\mysaved.csv" dsd dlm=',';
    informat var1-var4  32.;
    input  var1 - var4;
run;

Which will have the log show some invalid data messages when the N/A or miscellaneous not actually number values are encountered. Blanks and '.' shouldn't be an issue but "Missing" "N/A" "Unknown" and other such text are invalid.

 

IF you want a clean log you could create a custom informat that would "read" the "N/A" directly to missing without the message. But it sounds like you would be having to modify that frequently and may not be worth the effort.

 

Proc Import expects data in a column to be relatively similar. Placing text values like N/A in a column means the column is text, not numeric.

Tom
Super User Tom
Super User

If you have to read from a XLSX file then you do not have any control.  If SAS is running on a Windows machine and has a compatible version of Excel available you might be able to have more control using the DBMS=EXCEL option of PROC IMPORT or the EXCEL libref engine since that might support things like the DBSASTYPE= dataset option.

 

If the EXCEL column has at least one character string in any cell SAS must make a character variable.  In general you can represent any number as a string, but cannot do the reverse. This can be a problem if you want to use the same process to read multiple worksheets that you want to treat as having the same variables.  Since based on the content in the individual files you might sometimes get VAR1 as numeric and other times character. In that case you might tell PROC IMPORT to treat the header row as data.  That should force every column to be character.  That way you could have a standard process to use that imports the file and then runs some more code to convert the strings into numbers without having to worry about which variables need to be converted.

 

Let's run an example.

First let's make an XLSX file so we have something to program with.

Spoiler
data have;
  length x1-x4 $20. ;
  input x1-x4 ;
cards;
Var1 Var2 Var3 Var4
140 125 10 17
. . . .
143.04 126.72 13.029 32.817
missing unknown what help       
N/A N/A N/A N/A
120 150.879 11 10
;

filename xlsx temp;
proc export data=have dbms=xlsx file=xlsx replace;
  putnames=no;
run;

Now let's make a program that reads in the XLSX file treating the header as part of the data.  Uses PROC TRANSPOSE to generate a dataset with the names of the character variables.  Uses those names to generate code to make a character variable.  Then run a data step that reads in the data (skipping the headers) and executes the generated code.

proc import dbms=xlsx file=xlsx out=step1 replace;
  getnames=no;
run;

proc transpose data=step1(obs=1) out=names;
  var _character_;
run;

filename code temp;
data _null_;
  set names ;
  file code;
  put col1 '=input(' _name_ ',??32.);' 
    / 'drop ' _name_ ';'
  ;
run;

data want;
  set step1(firstobs=2);
%include code / source2;
run;

Results:

Tom_0-1731695413610.png

Tom_1-1731695436488.png

 

 

 

 

JuLePe
Calcite | Level 5

Thanks! Both suggestions worked, but I found the second one easier to implement.

Ksharp
Super User

As Tom said use option DBSASTYPE= to do it ,once for all.

 

proc import datafile='c:\temp\temp.xlsx' out=have dbms=excel replace;
dbdsopts='dbsastype=(var1=numeric var2=numeric var3=numeric var4=numeric)';
run;




Ksharp_0-1731738330646.png

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1040 views
  • 3 likes
  • 4 in conversation