Hi all,
My column names of my data looks like following:
TR_ABC_S1_ABC_MNO_S3 | TR_ABC_S2_ABC_MNO_S3 | TR_ABC_S10_ABC_S2 | TR_ABC_S2_ABC_S10 | TR_ABC_S2_ABC_PQR_S3 | TR_XYZ_PQR_S1_XYZ_MNO_S3 |
1 | 1 | 0 | 2 | 3 | 1 |
What I intend to do is split the column name in two, splitting it after the first number. I also intend to remove the 'TR_' that it starts with. The output I'm looking for is:
TR_ABC_S1_ABC_MNO_S3 | TR_ABC_S2_ABC_MNO_S3 | TR_ABC_S10_ABC_S2 | TR_ABC_S2_ABC_S10 | TR_ABC_S2_ABC_PQR_S3 | TR_XYZ_PQR_S1_XYZ_MNO_S3 |
ABC_S1 | ABC_S2 | ABC_S10 | ABC_S2 | ABC_S2 | XYZ_PQR_S1 |
MNO_S3 | MNO_S3 | ABC_S2 | ABC_S10 | ABC_PQR_S3 | XYZ_MNO_S3 |
I tried doing it using substr and scan for the first number in the string within it, but can't get it right. Could you please help? Thanks!
data have;
input TR_ABC_S1_ABC_MNO_S3 TR_ABC_S2_ABC_MNO_S3 TR_ABC_S10_ABC_S2 TR_ABC_S2_ABC_S10 TR_ABC_S2_ABC_PQR_S3 TR_XYZ_PQR_S1_XYZ_MNO_S3 ;
cards;
1 1 0 2 3 1
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
data want;
set temp;
_name_=prxchange('s/^tr_//i',1,_name_);
pid=prxparse('/[_a-z]+\d+/i');
s=1;e=length(_name_);
call prxnext(pid,s,e,_name_,p,l);
do while(p>0);
temp=prxchange('s/^_//',1,substr(_name_,p,l));
output;
call prxnext(pid,s,e,_name_,p,l);
end;
drop s e p l pid;
run;
Something like:
data want (orig res j); set have; length orig res $200; array tr{6) tr_:; do i=1 to 6; orig=vname(tr{i}); do j=1 to countw(orig,"_"); res=scan(orig,i,"_"); output; end; end; run;
This will create a nice long list of original variable names, and then each part of the name split by _. You can then cat() the ones you want back together, i.e. 2 and 3 for first part, then 5 and 6 for second.
Note, if you post test data in the form of a datastep then working code can be provided.
Second note, transposed is always harder to program with than normalised (going down). Also don't put data in variable/table names, it causes issues like this. Use a parameter/result approach in normalised structure - will make your life much easier.
Thanks @RW9.
I have managed to get it to columns, so I'm looking to get Name1 and Name2 from Name like the following:
Name | Name1 | Name2 |
TR_ABC_S1_ABC_MNO_S3 | ABC_S1 | MNO_S3 |
TR_ABC_S2_ABC_MNO_S3 | ABC_S2 | MNO_S3 |
TR_ABC_S10_ABC_S2 | ABC_S10 | ABC_S2 |
TR_ABC_S2_ABC_S10 | ABC_S2 | ABC_S10 |
TR_ABC_S2_ABC_PQR_S3 | ABC_S2 | ABC_PQR_S3 |
TR_XYZ_PQR_S1_XYZ_MNO_S3 | XYZ_PQR_S1 | XYZ_MNO_S3 |
Then:
data want; set have; length name1 name2 $200; name1=catx("_",scan(name,2,"_"),scan(name,3,"_")); name2=catx("_",scan(name,5,"_"),scan(name,6,"_")); run;
I don't think you can put fixed positions like '2', '3' etc since they are shaped differently.
Then loop over them:
data want; set have;
array name{2} $200;
in=1;
do i=1 to countw(name,"_");
if lengthn(compress(scan(name,i,"-")," ","d")) > 0 then do;
name{in}=catx('_',scan(name,i-1,"_"),scan(name,i,"_"));
in=in+1;
end;
end; run;
Note assumes maximum of 2 items with a number in, I.e. this checks each part, if it contains a digit then cat previous and current.
data have;
input TR_ABC_S1_ABC_MNO_S3 TR_ABC_S2_ABC_MNO_S3 TR_ABC_S10_ABC_S2 TR_ABC_S2_ABC_S10 TR_ABC_S2_ABC_PQR_S3 TR_XYZ_PQR_S1_XYZ_MNO_S3 ;
cards;
1 1 0 2 3 1
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
data want;
set temp;
_name_=prxchange('s/^tr_//i',1,_name_);
pid=prxparse('/[_a-z]+\d+/i');
s=1;e=length(_name_);
call prxnext(pid,s,e,_name_,p,l);
do while(p>0);
temp=prxchange('s/^_//',1,substr(_name_,p,l));
output;
call prxnext(pid,s,e,_name_,p,l);
end;
drop s e p l pid;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.