BookmarkSubscribeRSS Feed
ilikesas
Barite | Level 11

Hi,

 

I have an Excel file with many variables and when I import it to SAS they are all of a character format. But among these variables there are many variables whose observartions are all numbers, but still their format is character.

 

SAS gives an example on how to convert all character variables to numeric

http://support.sas.com/kb/40/700.html

proc contents data=test out=vars(keep=name type) noprint; 
 run;

data vars;                                                
set vars;                                                 
if type=2 ;                               
newname=trim(left(name))||"_n";  
run;

proc sql ;                                         
select trim(left(name)), trim(left(newname)),             
       trim(left(newname))||'='||trim(left(name))         
into :c_list separated by ' ', :n_list separated by ' ',  
     :renam_list separated by ' '                         
from vars;                                                
quit; 

 data test2;                                               
set test;                                                
array ch(*) $ &c_list;                                    
array nu(*) &n_list;                                      
do i = 1 to dim(ch);                                      
  nu(i)=input(ch(i),8.);                                  
end;                                                      
drop i &c_list;                                           
rename &renam_list;                                                                                      
run;   

But the above code converts to numeric even variables that should be character.

 

So is it possible for SAS to know automatically which characters should be converted to numeric and which should not?

 

Thank you!

8 REPLIES 8
SASKiwi
PROC Star

The simple answer is no as SAS does not know the significance or purpose of the variable. For example an all numeric variable might contain a credit card number. These are typically 16 digits and if you convert that into a SAS numeric variable the last couple of digit cannot be stored accurately so your credit card numbers get changed. This is quite apart from there being no need to do calculations on such a variable - the main reason you would want to convert.

 

Only you know the purpose of a variable and so can tell SAS which columns should be converted or not.

Reeza
Super User

No. SAS can make good guesses. This is what PROC import does, but as you've seen it's not always accurate. 

 

 

 

Kurt_Bremser
Super User

First of all:

DO NOT IMPORT DATA FROM EXCEL FILES!

Save to a reasonable format (.csv) and read that with a data step. That way you have full control over what is read and how.

With Excel files, you force SAS to make guesses; and since those guesses are dynamic, an unexpected change in the data can cause havoc to your analysis, instead of ending up with a failed data step that immediately points you to the problem.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

So you ran a guessing procedure (proc import) on an unstructured/uncontrolled file format, and now you have garbage.   This should come as no surprise.  The term is GIGO (Garbage in Garbage Out).  Fix your import source, starting with the Data Transfer agreement that you will have built in conjunction with the data vendor, most vendors are able to supply data in a usable format - if not stop paying them and they will soon disappear.  You import agreement will define data structure, list of values, frequencies and such like which will make the programming of the import a simple copy paste and add some syntax.

ballardw
Super User

1) Save the Excel to CSV

2) Run proc import on that with a large value for guessingrows (the data import wizard will let you set that if you don't want to write code).

3) Take a look at the results. If you don't like it you can recall the data step code generated for proc import and modify INFORMAT statements  to force reading variables as numbers or dates and assign suitable formats.

You could also clean up variable names as columns in Excel with headers like: The value is supposed to be used for abc will have long and difficult to use varaible names on import. Assign meaningful labels for those variables.

With a data step you can also do some initial processing such as unexpected value messages, age calculations from dates, decomposition of multivalued variables, creation of new varaibles.

Reeza
Super User

This macro is helpful when saving to CSV, especially if your fields have embedded comma's.

https://support.microsoft.com/en-us/kb/291296

ilikesas
Barite | Level 11

Given that after all SAS can't make the distinction, is it possibe to have a quasi solution by verifying the first say 100 observations of each character variable and if no characters are found then transform this variable into numerical?

 

 

Reeza
Super User

That's what GUESSINGROWS does. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2253 views
  • 9 likes
  • 6 in conversation