Help using Base SAS procedures

Convert columns defined as character to numeric for more than 1600 columns

Reply
Contributor
Posts: 63

Convert columns defined as character to numeric for more than 1600 columns

I am using a CSV file that has more than 1600 columns. Although the columns have numeric fields (numbers), while importing the columns get defined as character in SAS.

I want to convert them to numeric. I tried defining these as numeric in csv but can't do that as any changes to CSV file are not saved .

Is there a way to force all columns to be numeric in one go in SAS ? I can't manipulate 1600 columns individually.

Than

Frequent Contributor
Posts: 130

Re: Convert columns defined as character to numeric for more than 1600 columns

Posted in reply to buckeyefisher

I can think of a few different options to choose from to converted them.  The first would be to use a data step import for your csv file, unfortunately you would need to list all the column names and assign them the numeric format you want.  The second option would be to declare an array for each variable; however, I'm unsure of the maximum number of fields you can declare in an array and 1600 may be over that max.  You may want to look into declaring arrays and determine what that maximum number of elements to declare is.

Otherwise, the last would be to run a contents procedure and output the names to a data set.  Then create a macro and loop through each individual column name to change it into a numeric format.  This would run through 1600 iterations, but may save you programming time.  If you give some example fields/data of what you're working with, I could help provide some code that would do this.

Derek

Contributor
Posts: 63

Re: Convert columns defined as character to numeric for more than 1600 columns

Thanks, I am trying this, but getting tedious.

I hope theres a simpler way to convert all variables in one go like Stata that has *,  function to catch all. 

Super User
Posts: 19,770

Re: Convert columns defined as character to numeric for more than 1600 columns

Posted in reply to buckeyefisher

Modify your data input statement and how you import the CSV. If all fields are numeric read them all in using var1-var1600 and define as numeric and then create a big rename statement if required.

The following is untested but I think something like this to read in the file would work:

data want;

infile 'my_csv.csv' truncover dlm=',' dsd firstobs=2;

informat var1-var1600 best32.;

format var1-var1600 best12.;

input var1-var1600;

run;

Contributor
Posts: 63

Re: Convert columns defined as character to numeric for more than 1600 columns

Thanks, I am trying this, but getting tedious.

I hope theres a simpler way to convert all variables in one go like Stata that has *,  function to catch all. 

Frequent Contributor
Posts: 130

Re: Convert columns defined as character to numeric for more than 1600 columns

Posted in reply to buckeyefisher

code.jpg

Here's a less tedious way so you don't have to take up so much programming time.  Let them import as character, and then use this convert macro that will convert each field, one at a time, from character to numeric and loop through.  The example provided only has 5 observations and 5 variables, but you get the idea.  The final data set created called "have", will give you all the fields converted into numeric format.

Hope this helps!

Contributor
Posts: 63

Re: Convert columns defined as character to numeric for more than 1600 columns

dcruik.

Thanks for this solution.

How do I handle a file that looks like this with your solution

Type               var1     var2     Var3 ....var 1600

cancer               0          1          0     ...    

Cardiac               0          1          1     ..

Super User
Posts: 11,343

Re: Convert columns defined as character to numeric for more than 1600 columns

Posted in reply to buckeyefisher

HOW did you create the SAS data set to begin with? Go back to that step. If you used proc import it generates datastep code that you can modify. Basically set all of the informats generated to Best12 or similar instead of the $ values.

And if they were character you have something buried in your data that made SAS think the values could be character.

If you are ever likely to process data from this source again starting at the beginning is better than a mid-stream fix.

Contributor
Posts: 63

Re: Convert columns defined as character to numeric for more than 1600 columns

ballardw

Even if I try to make changes at the inception, either because of the csv format or the length of the file, the changes are not saved.

Then when I import CSV file using proc import I don't have any control on the field definitions.

Regarding your solution - I will need to use actual field names instead of var1-1600 ?

Super User
Posts: 19,770

Re: Convert columns defined as character to numeric for more than 1600 columns

Posted in reply to buckeyefisher

Don't change your CSV. Don't use proc import.

Use the data step method I suggested and add the variable Type, it should work.

Super User
Posts: 11,343

Re: Convert columns defined as character to numeric for more than 1600 columns

Posted in reply to buckeyefisher

If you run Proc Import and check the log IMMEDIATELY do you see something like this:

<from one of my jobs the line numbers come from my log setting and you may not have that option turned on.>


1     /**********************************************************************
2     *   PRODUCT:   SAS
3     *   VERSION:   9.2
4     *   CREATOR:   External File Interface
5     *   DATE:      15JUN15
6     *   DESC:      Generated SAS Datastep Code
7     *   TEMPLATE SOURCE:  (None Specified.)
8     ***********************************************************************/
9        data WORK.junk                                    ;
10       %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
11       infile 'D:\Data\IPP\Data\Text data\2014\D1_DEC_2014.csv' delimiter = ',' MISSOVER DSD
11 ! lrecl=32767 firstobs=2 ;
12          informat Client_ID best32. ;
13          informat Clinic_Site best32. ;
14          informat Clinician $22. ;
15          informat Sex best32. ;
16          informat Birth_Dt best32. ;
17          informat Zip best32. ;
18          informat Ethnicity best32. ;
19          informat Race best32. ;
20          informat Collection__Date best32. ;
21          informat Program_Area best32. ;
22          informat Insurance best32. ;
23          informat Test_Req_d_Site best32. ;
24          informat Test_Code best32. ;
25          informat Results_CT $1. ;
26          informat Results_GC best32. ;
27          format Client_ID best12. ;
28          format Clinic_Site best12. ;
29          format Clinician $22. ;
30          format Sex best12. ;
31          format Birth_Dt best12. ;
32          format Zip best12. ;
33          format Ethnicity best12. ;
34          format Race best12. ;
35          format Collection__Date best12. ;
36          format Program_Area best12. ;
37          format Insurance best12. ;
38          format Test_Req_d_Site best12. ;
39          format Test_Code best12. ;
40          format Results_CT $1. ;
41          format Results_GC best12. ;
42       input
43                   Client_ID
44                   Clinic_Site
45                   Clinician $
46                   Sex
47                   Birth_Dt
48                   Zip
49                   Ethnicity
50                   Race
51                   Collection__Date
52                   Program_Area
53                   Insurance
54                   Test_Req_d_Site
55                   Test_Code
56                   Results_CT $
57                   Results_GC
58       ;
59       if _ERROR_ then call symputx('_EFIERR_',1);  /* set ERROR detection macro variable */
60       run;

If so you have all of the variables and the base code. The code generated from your proc import can be copied from the log and pasted into the program editor. Since you say you have Character that should be numeric then replace the $xx values with best16.

You can remove the format statements or make them match.

In my case I actually change the code so some of the numeric are character because they are not actually numbers. ID and ZIP codes for example are not going to be used for computations.

Regular Contributor
Posts: 227

Re: Convert columns defined as character to numeric for more than 1600 columns

Posted in reply to buckeyefisher

check the documentation of the import procedure

and the option

GUESSINGROWS=n;

the default is a trivial 20

Default:20

Range:1 to 2147483647

these kinds of headaches are usually a result of the *.csv containing missing values:: blanks

or scientific notation

in previously-known-as numeric fields.

the way I would do this is with sql

write the list of names into a macro variable

PROC sql; create table list_names as

select name from dictionary.columns where libname eq 'WORK' and memname eq 'MYDATA'

where name like '';

select name into :list_names_n from list_names;

select catt('_',name) into :list_names_c from list_names;

select catt(name,'=_',name) into :list_renames from list_names;

quit;

DATA Now_numeric;

array n_(*) &list_names_n;

array c_(*) $ &list_names_c;

drop _:;*temp vars: the

do until(endofile);

   set mydata (&list_renames) end = endofile;

   do _i = 1 to dim(n_);

      n_(_i) = input(c_(_i),best.);

      end;

   output;

   end;

stop;

run;

hth

Regular Contributor
Posts: 227

Re: Convert columns defined as character to numeric for more than 1600 columns

Posted in reply to Ron_Fehd_macro_maven

Here is code for a fix of the imported data set.

%macro loop(dim=3

           ,text=%nrstr(a&i)

           ,testing = 0)

/des= 'macro loop: resolve text within loop';

%do i = 1 %to &dim;

%unquote(&text)

%end;

%mend;

options mprint;

%put %loop();

%put %loop(text=%nrstr(_b&i));

%put %loop(text=%nrstr(c&i = _c&i));

DATA mydata;

b1 = '1';

b2 = '2.2';

b3 = '0.03';

output;

stop;

run;

DATA now_numeric;

     array c_(*) $32 %loop(text=%nrstr(_b&i));

     array n_(*)     %loop(text=%nrstr(b&i));

     drop _:;*temp variables with prefix of underline;

set mydata(rename = (%loop(text=%nrstr(b&i = _b&i))));

do _i = 1 to 3;

   n_(_i) = input(c_(_i),best.);

   end;

PROC sql; describe table &syslast;

          quit;

Super User
Super User
Posts: 7,039

Re: Convert columns defined as character to numeric for more than 1600 columns

Posted in reply to buckeyefisher

If you know the variables are numeric, why are you using PROC IMPORT instead of just reading the data?

data want;

  infile 'myfile.csv' dsd firstobs=2 truncover lrecl=300000 ;

  input var1-var1600;

run;

If you want your program to use the actual names in the first line of the CSV file do something like this.

filename code temp;

data _null_;

  file code ;

  if _n_=1 then put 'input ' @;'

  if eof then put ';' ;

  infile 'myfile.csv' obs=1 dsd truncover lrecl= 300000  end=eof;

  length name $32 ;

input name @@ ;

put name @ ;

run;

data want;

  infile 'myfile.csv' dsd firstobs=2 truncover lrecl=300000 ;

%inc code / source2 ;

run;

Ask a Question
Discussion stats
  • 13 replies
  • 1127 views
  • 0 likes
  • 6 in conversation