BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RoddyJ
Obsidian | Level 7

Hi all,

 

I'm working with some tricky raw data and need to convert all numeric columns in an imported .csv file to character type.

 

I've followed this tutorial and got close to where I need to be, but I think something in the final data step is reverting the type of the columns back to numeric and I get the below note in the log:

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
      35:3   

Can anyone see anything in my code that might explain this or is able to explain the log note's meaning a bit further for me so I can find the issue myself.

 

My code looks like this:

/* Import April 2019 form */
PROC IMPORT OUT= raw_new_neo_april19
            DATAFILE= ***confidental*** 
            DBMS=CSV REPLACE;
     GETNAMES=YES;
     DATAROW=2;   
	 GUESSINGROWS=2000;
RUN;

/* Create dataset with all column names and type */ 
proc contents
     data = raw_new_neo_april19
          noprint
     out = vars1 (keep = name type);
run;

/* Use SQL to create macros with column names */
proc sql;
     * create macro list of all numeric variables names;
     select name
     into :numerics
          separated by ' '
     from vars1
     where type = 1;

     * create macro list of the numeric variables' names with the suffix "C";
     select trim(name) || 'C'
     into :characters
          separated by ' '
     from vars1
     where type = 1;

     * create macro list of the conversion from the original name to the new name with the suffix "C";
     select cats(name, ' = ' , name, 'C')
     into :conversions
          separated by ' '
     from vars1
     where type = 1;
quit;

/* Check column names all look okay */
%PUT &numerics;
%PUT &characters;
%PUT &conversions;

/* use the macro list "&conversions" to add the suffix "C" to the numeric variables' names */
proc datasets 
     library = work 
          nolist;
     modify raw_new_neo_april19;
     rename &conversions;
quit;

/* Use arrays and put to convert all numeric columns to character columns */ data raw_new_neo_april19_2; set raw_new_neo_april19; array nums[*] &numerics; array chars[*] &characters; do i = 1 to dim(nums); nums[i] = put(chars[i],10.); end; drop i &characters; run;
1 ACCEPTED SOLUTION
12 REPLIES 12
Tom
Super User Tom
Super User

Then don't use PROC IMPORT.  Converting them back to character after reading them into numbers is likely to lose information.

 

Just write your own data step to read the text file directly.

If you really have no idea what is in the file look at the code that PROC IMPORT generated and use it as a model (although PROC IMPORT writes messy code).

 

If you just want to read everything as character it is really easy as all you need to do is fill out the LENGTH statement to define how long each variable needs to be.

data want;
  infile "myfile" dsd firstobs=2 truncover ;
  length first_var $10 next_var $20 .... last_var $30 ;
  input first_var -- last_var;
run;

 

Kurt_Bremser
Super User

Do NOT use PROC IMPORT for CSV files. Write the data step that reads the file yourself, don't let the computer handle that for you (Maxim 31).

At least take the data step that IMPORT wrote for you from the log and adapt it to your needs.

RoddyJ
Obsidian | Level 7

Thanks for the speedy reply!

The csv has 195 columns, is there a neat and tidy way to import all columns as character variables in a data step without typing out all their names and taking several lines of code?

RoddyJ
Obsidian | Level 7
Unfortunately not, but willing to have leading blanks or suboptimal length allocation to sort later
Kurt_Bremser
Super User

So you WILL have to do a lot of manual work anyway.

As I said, copy/paste the header line into the INPUT statement. Then, split it up (every variable on its own line), and add the informat to set the length:

input
  var1 :$3.
  var2 :$35.

and so on.

Tom
Super User Tom
Super User

If the column headers are valid SAS variable names you can easily read the first line and store it into a macro variable to use to generate the step to read the data.

For example this code will read everything in as length 100.

filename csv "myfilename goes here";
data _null_;
  infile csv obs=1;
  input;
  call symputx('varnames',translate(_infile_,' ',','));
run;
data want (compress=yes);
  infile csv dsd firstobs=2 truncover ;
  input (&varnames) (:$100.);
run;
RoddyJ
Obsidian | Level 7
Thanks Tom, this is a very good solution!
ballardw
Super User

@RoddyJ wrote:

Hi all,

 

I'm working with some tricky raw data and need to convert all numeric columns in an imported .csv file to character type.

 

I've followed this tutorial and got close to where I need to be, but I think something in the final data step is reverting the type of the columns back to numeric and I get the below note in the log:

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
      35:3   

Can anyone see anything in my code that might explain this or is able to explain the log note's meaning a bit further for me so I can find the issue myself.

If you show us the entire log for the data step or procedure that NOTE so we can see which is line 35 we might have a chance.

 

Caution is in order with creating variable names by code when adding characters to the name. If the variable name already is 32 characters your "suffix" will either get truncated, ignored or cause errors depending on where it is used.

 

Since you are only changing "some" variables there is also a chance you have a collision with another existing variable name.  If you have a character variable named "AgeC" already in your data and a numeric variable "Age" then this approach has an opportunity to get confused or generate an error.

RoddyJ
Obsidian | Level 7
Thanks for the response, this is the full log

78
679 data raw_new_neo_april19_2;
680 set raw_new_neo_april19;
681
682
683 array nums[*] &numerics;
684 array chars[*] &characters;
685
686 do i = 1 to dim(nums);
687 nums[i] = put(chars[i],10.);
688 end;
689
690 drop i &characters;
691 run;

NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
687:3
NOTE: There were 35 observations read from the data set WORK.RAW_NEW_NEO_APRIL19.
NOTE: The data set WORK.RAW_NEW_NEO_APRIL19_2 has 35 observations and 195 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
RoddyJ
Obsidian | Level 7
That's done it, thank you so much!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2210 views
  • 2 likes
  • 4 in conversation