Hi all,
I'm working with some tricky raw data and need to convert all numeric columns in an imported .csv file to character type.
I've followed this tutorial and got close to where I need to be, but I think something in the final data step is reverting the type of the columns back to numeric and I get the below note in the log:
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
35:3
Can anyone see anything in my code that might explain this or is able to explain the log note's meaning a bit further for me so I can find the issue myself.
My code looks like this:
/* Import April 2019 form */
PROC IMPORT OUT= raw_new_neo_april19
DATAFILE= ***confidental***
DBMS=CSV REPLACE;
GETNAMES=YES;
DATAROW=2;
GUESSINGROWS=2000;
RUN;
/* Create dataset with all column names and type */
proc contents
data = raw_new_neo_april19
noprint
out = vars1 (keep = name type);
run;
/* Use SQL to create macros with column names */
proc sql;
* create macro list of all numeric variables names;
select name
into :numerics
separated by ' '
from vars1
where type = 1;
* create macro list of the numeric variables' names with the suffix "C";
select trim(name) || 'C'
into :characters
separated by ' '
from vars1
where type = 1;
* create macro list of the conversion from the original name to the new name with the suffix "C";
select cats(name, ' = ' , name, 'C')
into :conversions
separated by ' '
from vars1
where type = 1;
quit;
/* Check column names all look okay */
%PUT &numerics;
%PUT &characters;
%PUT &conversions;
/* use the macro list "&conversions" to add the suffix "C" to the numeric variables' names */
proc datasets
library = work
nolist;
modify raw_new_neo_april19;
rename &conversions;
quit;
/* Use arrays and put to convert all numeric columns to character columns */
data raw_new_neo_april19_2;
set raw_new_neo_april19;
array nums[*] &numerics;
array chars[*] &characters;
do i = 1 to dim(nums);
nums[i] = put(chars[i],10.);
end;
drop i &characters;
run;
Here
684 array chars[*] &characters;
you did not specify that the elements of the array should be character.
Then don't use PROC IMPORT. Converting them back to character after reading them into numbers is likely to lose information.
Just write your own data step to read the text file directly.
If you really have no idea what is in the file look at the code that PROC IMPORT generated and use it as a model (although PROC IMPORT writes messy code).
If you just want to read everything as character it is really easy as all you need to do is fill out the LENGTH statement to define how long each variable needs to be.
data want;
infile "myfile" dsd firstobs=2 truncover ;
length first_var $10 next_var $20 .... last_var $30 ;
input first_var -- last_var;
run;
Do NOT use PROC IMPORT for CSV files. Write the data step that reads the file yourself, don't let the computer handle that for you (Maxim 31).
At least take the data step that IMPORT wrote for you from the log and adapt it to your needs.
Thanks for the speedy reply!
The csv has 195 columns, is there a neat and tidy way to import all columns as character variables in a data step without typing out all their names and taking several lines of code?
Do these columns all share the same length?
So you WILL have to do a lot of manual work anyway.
As I said, copy/paste the header line into the INPUT statement. Then, split it up (every variable on its own line), and add the informat to set the length:
input
var1 :$3.
var2 :$35.
and so on.
If the column headers are valid SAS variable names you can easily read the first line and store it into a macro variable to use to generate the step to read the data.
For example this code will read everything in as length 100.
filename csv "myfilename goes here";
data _null_;
infile csv obs=1;
input;
call symputx('varnames',translate(_infile_,' ',','));
run;
data want (compress=yes);
infile csv dsd firstobs=2 truncover ;
input (&varnames) (:$100.);
run;
@RoddyJ wrote:
Hi all,
I'm working with some tricky raw data and need to convert all numeric columns in an imported .csv file to character type.
I've followed this tutorial and got close to where I need to be, but I think something in the final data step is reverting the type of the columns back to numeric and I get the below note in the log:
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column). 35:3
Can anyone see anything in my code that might explain this or is able to explain the log note's meaning a bit further for me so I can find the issue myself.
If you show us the entire log for the data step or procedure that NOTE so we can see which is line 35 we might have a chance.
Caution is in order with creating variable names by code when adding characters to the name. If the variable name already is 32 characters your "suffix" will either get truncated, ignored or cause errors depending on where it is used.
Since you are only changing "some" variables there is also a chance you have a collision with another existing variable name. If you have a character variable named "AgeC" already in your data and a numeric variable "Age" then this approach has an opportunity to get confused or generate an error.
Here
684 array chars[*] &characters;
you did not specify that the elements of the array should be character.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.