DATA Step, Macro, Functions and more

Extract a portion of a string variable from column name

Accepted Solution Solved
Reply
Contributor
Posts: 42
Accepted Solution

Extract a portion of a string variable from column name

Hi all,

 

My column names of my data looks like following:

TR_ABC_S1_ABC_MNO_S3TR_ABC_S2_ABC_MNO_S3TR_ABC_S10_ABC_S2TR_ABC_S2_ABC_S10TR_ABC_S2_ABC_PQR_S3TR_XYZ_PQR_S1_XYZ_MNO_S3
110231


What I intend to do is split the column name in two, splitting it after the first number. I also intend to remove the 'TR_' that it starts with. The output I'm looking for is:

 

TR_ABC_S1_ABC_MNO_S3TR_ABC_S2_ABC_MNO_S3TR_ABC_S10_ABC_S2TR_ABC_S2_ABC_S10TR_ABC_S2_ABC_PQR_S3TR_XYZ_PQR_S1_XYZ_MNO_S3
ABC_S1ABC_S2ABC_S10ABC_S2ABC_S2XYZ_PQR_S1
MNO_S3MNO_S3ABC_S2ABC_S10ABC_PQR_S3XYZ_MNO_S3

 

I tried doing it using substr and scan for the first number in the string within it, but can't get it right. Could you please help? Thanks!


Accepted Solutions
Solution
‎12-19-2017 10:47 AM
Super User
Posts: 10,686

Re: Extract a portion of a string variable from column name

data have;
input TR_ABC_S1_ABC_MNO_S3	TR_ABC_S2_ABC_MNO_S3	TR_ABC_S10_ABC_S2	TR_ABC_S2_ABC_S10	TR_ABC_S2_ABC_PQR_S3	TR_XYZ_PQR_S1_XYZ_MNO_S3 ;
cards;
1	1	0	2	3	1
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
data want;
 set temp;
 _name_=prxchange('s/^tr_//i',1,_name_);
pid=prxparse('/[_a-z]+\d+/i');
s=1;e=length(_name_);
call prxnext(pid,s,e,_name_,p,l);
do while(p>0);
  temp=prxchange('s/^_//',1,substr(_name_,p,l));
  output;
  call prxnext(pid,s,e,_name_,p,l);
end;
drop s e p l pid;
run;

View solution in original post


All Replies
Super User
Super User
Posts: 9,402

Re: Extract a portion of a string variable from column name

Something like:

data want (orig res j);
  set have;
  length orig res $200;
  array tr{6) tr_:;
  do i=1 to 6;
    orig=vname(tr{i});
    do j=1 to countw(orig,"_");
      res=scan(orig,i,"_");
      output;
    end;
  end;
run;

This will create a nice long list of original variable names, and then each part of the name split by _.  You can then cat() the ones you want back together, i.e. 2 and 3 for first part, then 5 and 6 for second.

Note, if you post test data in the form of a datastep then working code can be provided.

Second note, transposed is always harder to program with than normalised (going down).  Also don't put data in variable/table names, it causes issues like this.  Use a parameter/result approach in normalised structure - will make your life much easier.

Contributor
Posts: 42

Re: Extract a portion of a string variable from column name

Thanks @RW9.

 

I have managed to get it to columns, so I'm looking to get Name1 and Name2 from Name like the following:

 

NameName1Name2
TR_ABC_S1_ABC_MNO_S3ABC_S1MNO_S3
TR_ABC_S2_ABC_MNO_S3ABC_S2MNO_S3
TR_ABC_S10_ABC_S2ABC_S10ABC_S2
TR_ABC_S2_ABC_S10ABC_S2ABC_S10
TR_ABC_S2_ABC_PQR_S3ABC_S2ABC_PQR_S3
TR_XYZ_PQR_S1_XYZ_MNO_S3XYZ_PQR_S1XYZ_MNO_S3
Super User
Super User
Posts: 9,402

Re: Extract a portion of a string variable from column name

Then:

data want;
  set have;
  length name1 name2 $200;
  name1=catx("_",scan(name,2,"_"),scan(name,3,"_"));
  name2=catx("_",scan(name,5,"_"),scan(name,6,"_"));
run;
Contributor
Posts: 42

Re: Extract a portion of a string variable from column name

I don't think you can put fixed positions like '2', '3' etc since they are shaped differently. 

Super User
Super User
Posts: 9,402

Re: Extract a portion of a string variable from column name

Then loop over them:

data want;
  set have;
array name{2} $200;
in=1;
do i=1 to countw(name,"_");
if lengthn(compress(scan(name,i,"-")," ","d")) > 0 then do;
name{in}=catx('_',scan(name,i-1,"_"),scan(name,i,"_"));
in=in+1;
end;
end; run;  

Note assumes maximum of 2 items with a number in,  I.e. this checks each part, if it contains a digit then cat previous and current.   

Solution
‎12-19-2017 10:47 AM
Super User
Posts: 10,686

Re: Extract a portion of a string variable from column name

data have;
input TR_ABC_S1_ABC_MNO_S3	TR_ABC_S2_ABC_MNO_S3	TR_ABC_S10_ABC_S2	TR_ABC_S2_ABC_S10	TR_ABC_S2_ABC_PQR_S3	TR_XYZ_PQR_S1_XYZ_MNO_S3 ;
cards;
1	1	0	2	3	1
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
data want;
 set temp;
 _name_=prxchange('s/^tr_//i',1,_name_);
pid=prxparse('/[_a-z]+\d+/i');
s=1;e=length(_name_);
call prxnext(pid,s,e,_name_,p,l);
do while(p>0);
  temp=prxchange('s/^_//',1,substr(_name_,p,l));
  output;
  call prxnext(pid,s,e,_name_,p,l);
end;
drop s e p l pid;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 189 views
  • 4 likes
  • 3 in conversation