I am trying to create 4 variables for each column in the dataset below. For example, a_1 should be used to create a_1_1, a_1_2, a_1_3 and a_1_4 with each variable containing the 4 individual digits in parent variable a_1 : a_1_1 = 1; a_1_2=0; a_1_3=0; a_1_4=0; So, a total of 16 new variables created.
Is there a way to accomplish this using arrays and loops in a data step, making it easier to generalize this task?
data check;
input a_1 $ a_2 $ a_3 $ a_4 $;
datalines;
1000 1111 1010 1110
0101 1100 1001 1111
1100 0100 0001 1001
;
run;
Seems pretty simple. Convert it to a TALL structure and then use PROC TRANPOSE.
You will need a variable to identify the original observations. If you don't have one then just add one when making the TALL dataset.
data have;
input a_1 $ a_2 $ a_3 $ a_4 $;
datalines;
1000 1111 1010 1110
0101 1100 1001 1111
1100 0100 0001 1001
;
data tall ;
row+1;
set have;
array x a_1 - a_4 ;
length _name_ $32 col 8 char $1 ;
do index=1 to dim(x);
_name_= vname(x[index]);
do col=1 to length(x[index]);
char=char(x[index],col);
output;
end;
end;
run;
proc transpose data=tall out=want(drop=_name_) delim=_;
by row ;
id _name_ col;
var char ;
run;
proc print;
run;
Below should work as long as your source strings don't contain more than 4 characters.
A more generic solution would require that you first analyse your source strings and then create as many target variables as you need for the longest source string.
data have;
input a_1 $ a_2 $ a_3 $ a_4 $;
datalines;
1000 1111 1010 1110
0101 1100 1001 1111
1100 0100 0001 1001
;
run;
data want;
set have;
array src_vars{4} a_1 a_2 a_3 a_4;
array trg_vars{4,4} $1 a_11-a_14 a_21-a_24 a_31-a_34 a_41-a_44;
do i=1 to dim(src_vars);
do k=1 to length(src_vars[i]);
trg_vars[i,k]=substr(src_vars[i],k,1);
end;
end;
run;
proc print data=want;
run;
Thank you.
Is there a way to automatically create variable names in the target array below instead of having to spell it out because, I need to generalize the variable creation as the dataset can have any number of source variables. This data step will be part of a macro.
array trg_vars{4,4} $1 a_11-a_14 a_21-a_24 a_31-a_34 a_41-a_44;
A more generalize way is using double PROC TRANSPOSE just like Tom's code.
data have;
input a_1 $ a_2 $ a_3 $ a_4 $;
datalines;
1000 1111 1010 1110
0101 1100 1001 1111
1100 0100 0001 1001
;
run;
data temp;
set have;
id+1;
run;
proc transpose data=temp out=temp2;
by id;
var a:;
run;
data temp3;
set temp2;
do i=1 to lengthn(col1);
v=char(col1,i);output;
end;
run;
proc transpose data=temp3 out=want(drop=_name_) delimiter=_;
by id ;
id _name_ i;
var v;
run;
Seems pretty simple. Convert it to a TALL structure and then use PROC TRANPOSE.
You will need a variable to identify the original observations. If you don't have one then just add one when making the TALL dataset.
data have;
input a_1 $ a_2 $ a_3 $ a_4 $;
datalines;
1000 1111 1010 1110
0101 1100 1001 1111
1100 0100 0001 1001
;
data tall ;
row+1;
set have;
array x a_1 - a_4 ;
length _name_ $32 col 8 char $1 ;
do index=1 to dim(x);
_name_= vname(x[index]);
do col=1 to length(x[index]);
char=char(x[index],col);
output;
end;
end;
run;
proc transpose data=tall out=want(drop=_name_) delim=_;
by row ;
id _name_ col;
var char ;
run;
proc print;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.