09-15-2016 04:06 AM
I used proc import to read a csv file which read one of the column as best12. , but some of the values in the column are not numeric. So, I used the 'put' statement to convert numeric variable to character variable. But still I see a new column created contains a period when ever it come across a non numeric value(for example: "na" ). Is it possible to get the same non numeric value which is same as in the csv file.
09-15-2016 04:21 AM
if you have non-numeric values than the variables shall be character type. you may set the type of variable when using the wizard from the proc import procedure or alter the code afterwards.
if you have a variable numeric and some values character then, when importing the file all the non numeric values will be set to missing.
09-15-2016 04:42 AM
Which will of course double the time taken to import the file, if indeed it does fix the problem in the first place. Guessingrows reads the file twice - once to get a good guess on th erows specified, second to actually read the data. So double resource, and if you don't specify enough records, it may still not get the correct result. There is no replacement for effort on the programmers part.
09-15-2016 04:31 AM
Simple answer, don't use proc import which is a guessing procedure. Put the effort in and write the datastep import with the formats you know that are in your data. You can get a start to this code by looking at your log when you run your proc import. You will see that it generates a datastep with infile, and a line for each variable stating format, informat, length - to which you can add label also. For the informat, specify the proper informat of your data and the read in data will be read in that way. Avoids the need to post process the file and have multiple steps to process the data.
09-15-2016 04:33 AM
When you have non-numeric data in a column in your csv, you must read that column as character; you can then add logic to convert true numeric values into an additional numeric variable.
Here's a simple example:
data test; input invar $; if notdigit(trim(invar)) = 0 then numvar = input(invar,best.); cards; 123 456 xxx 789 ; run; proc print data=test; run;
Obs invar numvar 1 123 123 2 456 456 3 xxx . 4 789 789
09-15-2016 10:33 AM
Proc Import generated data step code that you can see in the log that was actually used to read the dataset.
Copy that code to the editor, change the informat and informat for that variable (and possibly others) to $25. or what ever would work, save the code and run it.
You might at the same time review other variables' informats, formats, names and consider adding Label statements and even rudimentary data checking. If you will be reading multiple files of this format then your would modify the program to change the infile and data set name statements and be good to go.
Reliance on proc import can lead to similar files with incompatible results such as this numeric/character issue or inconsistent lengths of variables and even variable names.