DATA Step, Macro, Functions and more

How to remove multiple '_' and number in the string?

Accepted Solution Solved
Reply
Super Contributor
Posts: 319
Accepted Solution

How to remove multiple '_' and number in the string?

[ Edited ]

Hello:

 

I have the following data.  I would like to remove all of the '_' and assign what the number in the string at the end.  Please help.  Thanks.

 

data have;

input Names$100.;

cards;

lab_den_pcr_perf_1

mhh_cyto_tests_1__cfdna

nad_cert_30

nad_imag_find2___abn_cort_gyr

;

run;

 

The result I am looking for is

labdenpcrperf1

mhhcytotestscfdna1

nadcert30

nadimagfindabncortgyr2


Accepted Solutions
Solution
‎07-01-2017 01:22 PM
Contributor
Posts: 22

Re: How to remove multipul '_' and number in the string?

@RW9

 

Specifying the k-modifier in the compress function keeps the "_" instead of removing it. 

 

I also think he wants the numbers at the end of the string instead of the beginning.

 

So modifying your code, I think this will work:

 

data want2;
  set have;
  names=cats(compress(names,"_","d"), compress(names,"_","a"));
run;

View solution in original post


All Replies
PROC Star
Posts: 757

Re: How to remove multipul '_' and number in the string?

You can use the COMPRESS function to remove certain characters from a string. Not sure what you mean by "assign what the number in the string at the end". Do you want to have the number of occurences of "_" at the end of each string?

Respected Advisor
Posts: 4,173

Re: How to remove multipul '_' and number in the string?

@ybz12003

Providing a "desired result" would help us to better understand what you're after.

 

The following code will populate a numeric variable with a number if digits only are found as the last element of your source string (with underscore separating the "elements").

data have;
  input Names :$100.;
  cards;
lab_den_pcr_perf_1
mhh_cyto_tests_1__cfdna
nad_cert_30
nad_imag_find2___abn_cort_gyr
;
run;

data want;
  set have;
  no_at_end_of_str=input(scan(names,-1,'_'),?? best32.);
run;
Super User
Super User
Posts: 7,970

Re: How to remove multipul '_' and number in the string?

Something like:

data want;
  set have;
  names=cats(compress(names,"_","ka"),compress(names,"_","kd"));
run;
Solution
‎07-01-2017 01:22 PM
Contributor
Posts: 22

Re: How to remove multipul '_' and number in the string?

@RW9

 

Specifying the k-modifier in the compress function keeps the "_" instead of removing it. 

 

I also think he wants the numbers at the end of the string instead of the beginning.

 

So modifying your code, I think this will work:

 

data want2;
  set have;
  names=cats(compress(names,"_","d"), compress(names,"_","a"));
run;
Super User
Super User
Posts: 7,970

Re: How to remove multipul '_' and number in the string?

Posted in reply to jdwaterman91

Actually my code is fine, just remove the _, whiich I added in haste:

data want;
  set have;
  names=cats(compress(names,"","ka"),compress(names,"","kd"));
run;
Super User
Posts: 5,513

Re: How to remove multipul '_' and number in the string?

Here's a way:

 

data want;

set have;

prefix = compress(names, '_0123456789');

suffix = compress(names, , 'kd' );

names = strip(prefix) || suffix;

drop prefix suffix;

run;

 

One caution:  If there are multiple sets of digits, all of them get combined and put at the end.  For example:

 

abc1_xyz2_def becomes abcxyzdef12

Contributor
Posts: 62

Re: How to remove multipul '_' and number in the string?

Posted in reply to Astounding

Like this ?

 

data want;

set have;

n_position= anydigit(names,1);

number= compress(substr(names,n_position),,"kd");

names_want= cats(compress(names,'_',"d"),number);

keep names_want;

run;

PROC Star
Posts: 326

Re: How to remove multipul '_' and number in the string?

something like this in regex

data want;
set have;
name = prxchange('s/_|([^\d]+$)//', -1, names);
run;
Super Contributor
Posts: 319

Re: How to remove multipul '_' and number in the string?

I start to understand what the prxchange means.

Respected Advisor
Posts: 4,173

Re: How to remove multiple '_' and number in the string?

@ybz12003

The following code returns the desired output as you've posted.

Based on your narrative I'm not sure if this really is what you're after. If not then please post some additional sample data where it's not working for you.

data have;
  input Names :$100.;
  cards;
lab_den_pcr_perf_1
mhh_cyto_tests_1__cfdna
nad_cert_30
nad_imag_find2___abn_cort_gyr
;
run;

data want;
  set have;
  length names_want $100;
  names_want=cats(compress(Names,'_','d'),compress(Names,,'kd'));
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 325 views
  • 3 likes
  • 8 in conversation