Hi all,
My column names of my data looks like following:
TR_ABC_S1_ABC_MNO_S3 | TR_ABC_S2_ABC_MNO_S3 | TR_ABC_S10_ABC_S2 | TR_ABC_S2_ABC_S10 | TR_ABC_S2_ABC_PQR_S3 | TR_XYZ_PQR_S1_XYZ_MNO_S3 |
1 | 1 | 0 | 2 | 3 | 1 |
What I intend to do is split the column name in two, splitting it after the first number. I also intend to remove the 'TR_' that it starts with. The output I'm looking for is:
TR_ABC_S1_ABC_MNO_S3 | TR_ABC_S2_ABC_MNO_S3 | TR_ABC_S10_ABC_S2 | TR_ABC_S2_ABC_S10 | TR_ABC_S2_ABC_PQR_S3 | TR_XYZ_PQR_S1_XYZ_MNO_S3 |
ABC_S1 | ABC_S2 | ABC_S10 | ABC_S2 | ABC_S2 | XYZ_PQR_S1 |
MNO_S3 | MNO_S3 | ABC_S2 | ABC_S10 | ABC_PQR_S3 | XYZ_MNO_S3 |
I tried doing it using substr and scan for the first number in the string within it, but can't get it right. Could you please help? Thanks!
data have;
input TR_ABC_S1_ABC_MNO_S3 TR_ABC_S2_ABC_MNO_S3 TR_ABC_S10_ABC_S2 TR_ABC_S2_ABC_S10 TR_ABC_S2_ABC_PQR_S3 TR_XYZ_PQR_S1_XYZ_MNO_S3 ;
cards;
1 1 0 2 3 1
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
data want;
set temp;
_name_=prxchange('s/^tr_//i',1,_name_);
pid=prxparse('/[_a-z]+\d+/i');
s=1;e=length(_name_);
call prxnext(pid,s,e,_name_,p,l);
do while(p>0);
temp=prxchange('s/^_//',1,substr(_name_,p,l));
output;
call prxnext(pid,s,e,_name_,p,l);
end;
drop s e p l pid;
run;
Something like:
data want (orig res j); set have; length orig res $200; array tr{6) tr_:; do i=1 to 6; orig=vname(tr{i}); do j=1 to countw(orig,"_"); res=scan(orig,i,"_"); output; end; end; run;
This will create a nice long list of original variable names, and then each part of the name split by _. You can then cat() the ones you want back together, i.e. 2 and 3 for first part, then 5 and 6 for second.
Note, if you post test data in the form of a datastep then working code can be provided.
Second note, transposed is always harder to program with than normalised (going down). Also don't put data in variable/table names, it causes issues like this. Use a parameter/result approach in normalised structure - will make your life much easier.
Thanks @RW9.
I have managed to get it to columns, so I'm looking to get Name1 and Name2 from Name like the following:
Name | Name1 | Name2 |
TR_ABC_S1_ABC_MNO_S3 | ABC_S1 | MNO_S3 |
TR_ABC_S2_ABC_MNO_S3 | ABC_S2 | MNO_S3 |
TR_ABC_S10_ABC_S2 | ABC_S10 | ABC_S2 |
TR_ABC_S2_ABC_S10 | ABC_S2 | ABC_S10 |
TR_ABC_S2_ABC_PQR_S3 | ABC_S2 | ABC_PQR_S3 |
TR_XYZ_PQR_S1_XYZ_MNO_S3 | XYZ_PQR_S1 | XYZ_MNO_S3 |
Then:
data want; set have; length name1 name2 $200; name1=catx("_",scan(name,2,"_"),scan(name,3,"_")); name2=catx("_",scan(name,5,"_"),scan(name,6,"_")); run;
I don't think you can put fixed positions like '2', '3' etc since they are shaped differently.
Then loop over them:
data want; set have;
array name{2} $200;
in=1;
do i=1 to countw(name,"_");
if lengthn(compress(scan(name,i,"-")," ","d")) > 0 then do;
name{in}=catx('_',scan(name,i-1,"_"),scan(name,i,"_"));
in=in+1;
end;
end; run;
Note assumes maximum of 2 items with a number in, I.e. this checks each part, if it contains a digit then cat previous and current.
data have;
input TR_ABC_S1_ABC_MNO_S3 TR_ABC_S2_ABC_MNO_S3 TR_ABC_S10_ABC_S2 TR_ABC_S2_ABC_S10 TR_ABC_S2_ABC_PQR_S3 TR_XYZ_PQR_S1_XYZ_MNO_S3 ;
cards;
1 1 0 2 3 1
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
data want;
set temp;
_name_=prxchange('s/^tr_//i',1,_name_);
pid=prxparse('/[_a-z]+\d+/i');
s=1;e=length(_name_);
call prxnext(pid,s,e,_name_,p,l);
do while(p>0);
temp=prxchange('s/^_//',1,substr(_name_,p,l));
output;
call prxnext(pid,s,e,_name_,p,l);
end;
drop s e p l pid;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.