11-01-2016 11:34 PM
I have an Excel file with many variables and when I import it to SAS they are all of a character format. But among these variables there are many variables whose observartions are all numbers, but still their format is character.
SAS gives an example on how to convert all character variables to numeric
proc contents data=test out=vars(keep=name type) noprint; run; data vars; set vars; if type=2 ; newname=trim(left(name))||"_n"; run; proc sql ; select trim(left(name)), trim(left(newname)), trim(left(newname))||'='||trim(left(name)) into :c_list separated by ' ', :n_list separated by ' ', :renam_list separated by ' ' from vars; quit; data test2; set test; array ch(*) $ &c_list; array nu(*) &n_list; do i = 1 to dim(ch); nu(i)=input(ch(i),8.); end; drop i &c_list; rename &renam_list; run;
But the above code converts to numeric even variables that should be character.
So is it possible for SAS to know automatically which characters should be converted to numeric and which should not?
11-01-2016 11:52 PM - edited 11-01-2016 11:55 PM
The simple answer is no as SAS does not know the significance or purpose of the variable. For example an all numeric variable might contain a credit card number. These are typically 16 digits and if you convert that into a SAS numeric variable the last couple of digit cannot be stored accurately so your credit card numbers get changed. This is quite apart from there being no need to do calculations on such a variable - the main reason you would want to convert.
Only you know the purpose of a variable and so can tell SAS which columns should be converted or not.
11-02-2016 03:26 AM
First of all:
DO NOT IMPORT DATA FROM EXCEL FILES!
Save to a reasonable format (.csv) and read that with a data step. That way you have full control over what is read and how.
With Excel files, you force SAS to make guesses; and since those guesses are dynamic, an unexpected change in the data can cause havoc to your analysis, instead of ending up with a failed data step that immediately points you to the problem.
11-02-2016 05:30 AM
So you ran a guessing procedure (proc import) on an unstructured/uncontrolled file format, and now you have garbage. This should come as no surprise. The term is GIGO (Garbage in Garbage Out). Fix your import source, starting with the Data Transfer agreement that you will have built in conjunction with the data vendor, most vendors are able to supply data in a usable format - if not stop paying them and they will soon disappear. You import agreement will define data structure, list of values, frequencies and such like which will make the programming of the import a simple copy paste and add some syntax.
11-02-2016 10:43 AM
1) Save the Excel to CSV
2) Run proc import on that with a large value for guessingrows (the data import wizard will let you set that if you don't want to write code).
3) Take a look at the results. If you don't like it you can recall the data step code generated for proc import and modify INFORMAT statements to force reading variables as numbers or dates and assign suitable formats.
You could also clean up variable names as columns in Excel with headers like: The value is supposed to be used for abc will have long and difficult to use varaible names on import. Assign meaningful labels for those variables.
With a data step you can also do some initial processing such as unexpected value messages, age calculations from dates, decomposition of multivalued variables, creation of new varaibles.
11-02-2016 11:37 AM
This macro is helpful when saving to CSV, especially if your fields have embedded comma's.
11-03-2016 01:16 AM
Given that after all SAS can't make the distinction, is it possibe to have a quasi solution by verifying the first say 100 observations of each character variable and if no characters are found then transform this variable into numerical?