Solved
Contributor
Posts: 42

# Extract a portion of a string variable from column name

Hi all,

My column names of my data looks like following:

 TR_ABC_S1_ABC_MNO_S3 TR_ABC_S2_ABC_MNO_S3 TR_ABC_S10_ABC_S2 TR_ABC_S2_ABC_S10 TR_ABC_S2_ABC_PQR_S3 TR_XYZ_PQR_S1_XYZ_MNO_S3 1 1 0 2 3 1

What I intend to do is split the column name in two, splitting it after the first number. I also intend to remove the 'TR_' that it starts with. The output I'm looking for is:

 TR_ABC_S1_ABC_MNO_S3 TR_ABC_S2_ABC_MNO_S3 TR_ABC_S10_ABC_S2 TR_ABC_S2_ABC_S10 TR_ABC_S2_ABC_PQR_S3 TR_XYZ_PQR_S1_XYZ_MNO_S3 ABC_S1 ABC_S2 ABC_S10 ABC_S2 ABC_S2 XYZ_PQR_S1 MNO_S3 MNO_S3 ABC_S2 ABC_S10 ABC_PQR_S3 XYZ_MNO_S3

I tried doing it using substr and scan for the first number in the string within it, but can't get it right. Could you please help? Thanks!

Accepted Solutions
Solution
‎12-19-2017 10:47 AM
Super User
Posts: 10,686

## Re: Extract a portion of a string variable from column name

``````data have;
input TR_ABC_S1_ABC_MNO_S3	TR_ABC_S2_ABC_MNO_S3	TR_ABC_S10_ABC_S2	TR_ABC_S2_ABC_S10	TR_ABC_S2_ABC_PQR_S3	TR_XYZ_PQR_S1_XYZ_MNO_S3 ;
cards;
1	1	0	2	3	1
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
data want;
set temp;
_name_=prxchange('s/^tr_//i',1,_name_);
pid=prxparse('/[_a-z]+\d+/i');
s=1;e=length(_name_);
call prxnext(pid,s,e,_name_,p,l);
do while(p>0);
temp=prxchange('s/^_//',1,substr(_name_,p,l));
output;
call prxnext(pid,s,e,_name_,p,l);
end;
drop s e p l pid;
run;``````

All Replies
Super User
Posts: 9,402

## Re: Extract a portion of a string variable from column name

Something like:

```data want (orig res j);
set have;
length orig res \$200;
array tr{6) tr_:;
do i=1 to 6;
orig=vname(tr{i});
do j=1 to countw(orig,"_");
res=scan(orig,i,"_");
output;
end;
end;
run;
```

This will create a nice long list of original variable names, and then each part of the name split by _.  You can then cat() the ones you want back together, i.e. 2 and 3 for first part, then 5 and 6 for second.

Note, if you post test data in the form of a datastep then working code can be provided.

Second note, transposed is always harder to program with than normalised (going down).  Also don't put data in variable/table names, it causes issues like this.  Use a parameter/result approach in normalised structure - will make your life much easier.

Contributor
Posts: 42

## Re: Extract a portion of a string variable from column name

Thanks @RW9.

I have managed to get it to columns, so I'm looking to get Name1 and Name2 from Name like the following:

 Name Name1 Name2 TR_ABC_S1_ABC_MNO_S3 ABC_S1 MNO_S3 TR_ABC_S2_ABC_MNO_S3 ABC_S2 MNO_S3 TR_ABC_S10_ABC_S2 ABC_S10 ABC_S2 TR_ABC_S2_ABC_S10 ABC_S2 ABC_S10 TR_ABC_S2_ABC_PQR_S3 ABC_S2 ABC_PQR_S3 TR_XYZ_PQR_S1_XYZ_MNO_S3 XYZ_PQR_S1 XYZ_MNO_S3
Super User
Posts: 9,402

## Re: Extract a portion of a string variable from column name

Then:

```data want;
set have;
length name1 name2 \$200;
name1=catx("_",scan(name,2,"_"),scan(name,3,"_"));
name2=catx("_",scan(name,5,"_"),scan(name,6,"_"));
run;
```
Contributor
Posts: 42

## Re: Extract a portion of a string variable from column name

I don't think you can put fixed positions like '2', '3' etc since they are shaped differently.

Super User
Posts: 9,402

## Re: Extract a portion of a string variable from column name

Then loop over them:

```data want;
set have;  array name{2} \$200;  in=1;  do i=1 to countw(name,"_");    if lengthn(compress(scan(name,i,"-")," ","d")) > 0 then do;      name{in}=catx('_',scan(name,i-1,"_"),scan(name,i,"_"));      in=in+1;    end;  end;
run;  ```

Note assumes maximum of 2 items with a number in,  I.e. this checks each part, if it contains a digit then cat previous and current.

Solution
‎12-19-2017 10:47 AM
Super User
Posts: 10,686

## Re: Extract a portion of a string variable from column name

``````data have;
input TR_ABC_S1_ABC_MNO_S3	TR_ABC_S2_ABC_MNO_S3	TR_ABC_S10_ABC_S2	TR_ABC_S2_ABC_S10	TR_ABC_S2_ABC_PQR_S3	TR_XYZ_PQR_S1_XYZ_MNO_S3 ;
cards;
1	1	0	2	3	1
;
run;
proc transpose data=have(obs=0) out=temp;
var _all_;
run;
data want;
set temp;
_name_=prxchange('s/^tr_//i',1,_name_);
pid=prxparse('/[_a-z]+\d+/i');
s=1;e=length(_name_);
call prxnext(pid,s,e,_name_,p,l);
do while(p>0);
temp=prxchange('s/^_//',1,substr(_name_,p,l));
output;
call prxnext(pid,s,e,_name_,p,l);
end;
drop s e p l pid;
run;``````
☑ This topic is solved.