Hi, I have a data set with a large amount of character variables; I want to convert them to numerical variables.
Can someone show me how to do this? I'm assuming you have to use arrays.
For ease of exposition suppose I have ten character variables.
Thanks!
This is clearly and example of bad data strcuture. Why do you have "a large amount of character variables" on which you need to do this processing? Why not have your data elements going fown the page as observations, it makes almost all programming tasks so much easier:
data have; x="1"; y="2"; z="3"; run; proc transpose data=have out=t_have; var x y z; run; /* First example you need to know all the variables */ data want1 (drop=i); set have; array xyz{3} x y z; array xyz_num{3} 8; do i=1 to 3; xyz_num{i}=input(xyz{i},best.); end; run; /* Second example you only need to know there is two columns */ data want2; set t_have xyz_num=input(col1,best.); run;
Remember the data you work with and program with does NOT need to be the same as what is in the output - make life simple for you.
If all character variables carry numeric values (or are empty), then you can use this as a blueprint:
data _null_;
set have;
array charvar {*} _character_;
call symput('numvars',strip(put(dim(charvar),best.)));
stop;
run;
data want;
set have;
array charvar {*} _character_;
array newvar {&numvars} newvar1-newvar&numvars;
do i = 1 to &numvars;
newvar{i} = input(charvar{i},best.);
end;
drop i;
run;
Hi thanks...I'm new to programming, so struggling a bit in understanding your code.
If you assume that the names of the character variables that I'm trying to convert are X,Y and Z...would you mind subsituting those variable names in their respective places in the code you have posted? Just so that I can clearly understand what needs to be done. Thank you so much!
OK, lets dissect my code a little:
/* the first step gets the number of character variables present in the
dataset, so I can later define the array of new variables with the correct
number of members */
data _null_; * do not create an output dataset;
set have;
array charvar {*} _character_; * define an array that contains all character variables,
no need to know their names;;
call symput('numvars',strip(put(dim(charvar),best.))); * put the value into a macro variable;
stop; * end execution in the first iteration of the data step;
run;
data want;
set have;
array charvar {*} _character_; * see above;
array newvar {&numvars} newvar1-newvar&numvars;* here I have to define names for the new (numeric) variables,
for the size I use the macro variable created in the first step;
* also note that the default for a newly defined array is numeric;
do i = 1 to &numvars; * iterate through both arrays;
newvar{i} = input(charvar{i},best.); * convert;
end;
drop i; * get rid of the index variable;
run;
Thanks a lot! Truly appreciate it.
@aaou wrote:
Hi thanks...I'm new to programming, so struggling a bit in understanding your code.
If you assume that the names of the character variables that I'm trying to convert are X,Y and Z...would you mind subsituting those variable names in their respective places in the code you have posted? Just so that I can clearly understand what needs to be done. Thank you so much!
If you do not have a naming pattern for your variables (that enables simple iteration with an index), it makes things more complicated. To fully automate the task, you will need to use the dataset metadata to create a list of variables to be converted.
Assume you have stored the library and dataset name in macro variables:
data _null_;
set sashelp.vcolumn (
where=(libname = "&mylibname" and memname = "&mydataset" and type = 'char')
) end=done;
/* get data from the view in SASHELP that describes columns;
take only those from your dataset with type character */
/* also define a variable that signals the end */
if _n_ = 1 then call execute("data &mylibname..&mydataset._num; set &mylibname..&mydataset;");
/* call execute pushes code into the execution queue to be executed
immediately after the current data step ends */
/* this one starts a data step */
call execute(trim(name)!!'_num = input('!!trim(name)!!',best.);');
/* do the conversions */
if done then call execute('run;');
/* finish the data step */
run;
Assume that you have 10 char variables(char1-char10), then you could try the below untested code
which creates 10 numeric variables
Data want;
set have;
array ch(10) char1-char10;
array nu(10) num1-num10;
do i = 1 to 10;
nu(i)=input(ch(i),best.);
end;
run;
This is clearly and example of bad data strcuture. Why do you have "a large amount of character variables" on which you need to do this processing? Why not have your data elements going fown the page as observations, it makes almost all programming tasks so much easier:
data have; x="1"; y="2"; z="3"; run; proc transpose data=have out=t_have; var x y z; run; /* First example you need to know all the variables */ data want1 (drop=i); set have; array xyz{3} x y z; array xyz_num{3} 8; do i=1 to 3; xyz_num{i}=input(xyz{i},best.); end; run; /* Second example you only need to know there is two columns */ data want2; set t_have xyz_num=input(col1,best.); run;
Remember the data you work with and program with does NOT need to be the same as what is in the output - make life simple for you.
Converting character data to numeric
HAVE
Variables in Creation Order
# Variable Type Len
1 X Char 1
2 Y Char 1
3 Z Char 1
WANT
Variables in Creation Order
# Variable Type Len
1 X Num 8
2 Y Num 8
3 Z Num 8
* create some data;
data have;
x="1"; y="2"; z="3";
run;
* create the select clause to covert char to num;
proc sql;
select
catx(' ','input(',name,',best.) as',name) into :namlst separated by ','
from
sashelp.vcolumn
where
libname='WORK'
and memname='HAVE'
and upcase(type) eqt 'C'
;quit;
%put &=namlst;
/*
input( X ,best.) as X
,input( Y ,best.) as Y
,input( Z ,best.) as Z
*/
* do the conversion;
proc sql;
create
table want as
select
&namlst
from
have;
;quit;
/*
X Y Z
----------------------------
1 2 3
*/
Thank you, I followed your advice and changed things at the start it self.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.