SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Recoding a matrix of character variables into numeric

Reply
Regular Learner
Posts: 1

Recoding a matrix of character variables into numeric

Hello,

 

I have a dataset (dat1) that has an identifier (numeric) as the first variable and n character variables. I also have another dataset (dat2)  that maps each possible entry in all the character variables to a numeric value.

 

 

data dat1;
input id cv1 $ cv2 $ cv3 $ cv4 $;
datalines;
111 aaa bbb ccc ccc
112 bbb bbb aaa ddd
113 aaa ccc ccc aaa
114 ddd bbb eee bbb
;

data dat2;
input cv $ nv;
datalines;
aaa 1
bbb 2
ccc 3
ddd 4
eee 5
;

I'd like to turn all the character variables in dat1 to numeric using the mapping in dat2. Therefore, the resulting dataset in my example should be like:

 

 

 

data dat3;
input id nv1 nv2 nv3 nv4;
datalines;
111 1 2 3 3 
112 2 2 1 4 
113 1 3 3 1
114 4 2 5 2 
;


 

Obviously, my true dat1 dataset is huge.

Thanks.

 

Super User
Posts: 10,500

Re: Recoding a matrix of character variables into numeric

Not really possible to change data type from character to numeric directly. One way is to create a format and reread the old values into new variables using the values in dat2 to create an informat.

data dat2;
input cv $ nv;
datalines;
aaa 1
bbb 2
ccc 3
ddd 4
eee 5
;
run;

data cvcntlin;
   set dat2;
   FMTName= 'Cv2Num';
   start=cv;
   label=nv;
   type='I';
run;

proc format
library=work cntlin=cvcntlin;
run; 

data dat1;
input id cv1 $ cv2 $ cv3 $ cv4 $;
datalines;
111 aaa bbb ccc ccc
112 bbb bbb aaa ddd
113 aaa ccc ccc aaa
114 ddd bbb eee bbb
;
run;

data want;
   set dat1 (rename=(cv1=_cv1 cv2=_cv2 cv3=_cv3 cv4=_cv4 ));
   array old _cv: ;
   array new cv1 - cv4;
   do i=1 to dim(old);
      new(i) = input(old[i],CV2num.);
   end;
   drop _cv i;
run;
   
Respected Advisor
Posts: 4,646

Re: Recoding a matrix of character variables into numeric

One efficient way to do this kind of recoding is with an informat:

 


data myInFormat;
set dat2;
fmtname = "myFmt";
type = "I";
start = cv;
label = cats(nv);
run;

proc format cntlin=myInFormat; run;

data dat3;
set dat1;
array cv{4};
array nv{4};
do i = 1 to dim(cv);
    nv{i} = input(cv{i}, myFmt.);
    end;
drop cv:;
run;
PG
Super User
Posts: 5,082

Re: Recoding a matrix of character variables into numeric

Would somebody please post a hashing solution ... I can figure it out if I have to but it should be simple for someone who is used to doing this.

Ask a Question
Discussion stats
  • 3 replies
  • 277 views
  • 1 like
  • 4 in conversation