DATA Step, Macro, Functions and more

How to remove the last '_' and number?

Accepted Solution Solved
Reply
Super Contributor
Posts: 297
Accepted Solution

How to remove the last '_' and number?

Hello:

 

I would like to remove the last '_' and number from the strings below.  Please advice how.  Thanks.

 

data test1;

input Names:$100.;

datalines;

dx_1

dx_10

state_id_5

state_id_9

if_mg_hk_4

;

run;

 

I am looking for the results is like below.

list

dx

dx

state_id

state_id

if_mg_hk


Accepted Solutions
Solution
‎07-05-2017 07:52 AM
Trusted Advisor
Posts: 1,131

Re: How to remove the last '_' and number?

Alternatively with perl regular expression

 

data test1;
input Names:$100.;
names=prxchange('s/\_\d+//',-1,compress(names));
put names=;
datalines;
dx_1
dx_10
state_id_5
state_id_9
if_mg_hk_4
;
run;
Thanks,
Jag

View solution in original post


All Replies
Super User
Posts: 17,961

Re: How to remove the last '_' and number?

Bit of a long way around, but find out the length of the last component which will be 2 or 3 it seems and then use SUBSTR to extract from 1 to the length of the string - 2/3.

 

data test1;
    input Names:$100.;

    last=scan(names, -1, "_");
    word=substr(names, 1, length(names)-length(last)-1);

    datalines;
dx_1
dx_10
state_id_5
state_id_9
if_mg_hk_4
;
run;

proc print ;
run;
Solution
‎07-05-2017 07:52 AM
Trusted Advisor
Posts: 1,131

Re: How to remove the last '_' and number?

Alternatively with perl regular expression

 

data test1;
input Names:$100.;
names=prxchange('s/\_\d+//',-1,compress(names));
put names=;
datalines;
dx_1
dx_10
state_id_5
state_id_9
if_mg_hk_4
;
run;
Thanks,
Jag
Super Contributor
Posts: 297

Re: How to remove the last '_' and number?

What is the '\_\d+/' refering to?

 

Regular Contributor
Posts: 234

Re: How to remove the last '_' and number?

Regular expression is kind of seperate  language in SAS having its own syntax. This paper has nice introduction of  it.

http://www2.sas.com/proceedings/sugi29/265-29.pdf

Trusted Advisor
Posts: 1,131

Re: How to remove the last '_' and number?

[ Edited ]

\_ and \d are metacharacters

 

\_ : recognizes the underscore( _ )

\d : recognizes the digit (0-9)

 + : recognize the charcater/digit one or more times

 

\_\d+ : recognize the underscore followed by digit (one of more time)

 

// : replace the metacharacters identified and make them blank.

Thanks,
Jag
Respected Advisor
Posts: 3,908

Re: How to remove the last '_' and number?

[ Edited ]

@ybz12003

Use the syntax which returns the result you want and which is easiest for you to understand and maintain.

Below 3 coding version (word1 to word3) for you to pick and choose.

The sample code and output hopefully also illustrates that it's really important to provide "nasty" sample data which covers as many cases as you can think of.

data test1;
  infile datalines truncover;
  input Names:$100.;
  length word1 word2 word3 last $100;

  last=scan(names, -1, "_");
  word1=substr(names, 1, length(names)-length(last)-1);
  word2=substr(names, 1, findc(names,'_',-length(names))-1);
  word3=prxchange('s/_\d+\s*$//oi',1,names);
  datalines;
dx_1
dx_10
state_id_5
state_id_9
if_mg_hk_4
if_mg_hk_
if_mg_hk
if
if_mg_5_hk_4
if_mg_5_hk_
if_mg_5_hk
 
_5_tt
_5tt
 
9_5_tt
_9_5tt
;
run;

 

 Capture.JPG

Super Contributor
Posts: 297

Re: How to remove the last '_' and number?

Thanks for all of your kind help, and great explaination.

Respected Advisor
Posts: 3,908

Re: How to remove the last '_' and number?

[ Edited ]

@ybz12003

Even though the solution you've accepted returns the desired result for you sample data, I believe the RegEx needs a small amendment to not cause issues for your real data.

I suggest you add a $ to the RegEx to ensure that it only removes _<digits> at the end of a string.

The $ stands for "end of string"

 

You can also replace the -1 with a 1  

-1 means that the pattern gets searched and replaces as many time as it's found. You should have only a single matching pattern at maximum so using a 1 without the minus appears appropriate.

 

names=prxchange('s/\_\d+$//',1,trim(names));

 

Without the $ ALL _<digits> patterns would get removed and a source string like ABC_5_XX_10 would become ABC_XX

 

And instead of compress() use trim()

Compress removes all blanks from a string - also the ones in the middle of the string - which potentially changes your source string more than what you have been asking for. Trim() only removes trailing blanks which ensures that after the last _<digits> pattern there won't be any blanks before the <end of string> $ 

 

Alternatively the RegEx could also look like:

names=prxchange('s/\_\d+\s*$//',1,names);

 

\s* represents 0 to many blanks and though you don't have to remove the trailing blanks anymore as now the RegEx caters for such a case. 

Super Contributor
Posts: 297

Re: How to remove the last '_' and number?

Thanks for your detail explaination, Patrick. 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 172 views
  • 5 likes
  • 5 in conversation