DATA Step, Macro, Functions and more

how do I trim the piece to the right of the last underscore?

Reply
Frequent Contributor
Posts: 133

how do I trim the piece to the right of the last underscore?

I just want: "IR_MX_GOVT_BFV_MXN" , or trim off the piece to the right of the last underscore, including the underscore.

How do I do that?

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

run;

Super User
Super User
Posts: 6,500

Re: how do I trim the piece to the right of the last underscore?

I usually use SAS functions like SCAN and SUBSTR, but you can probably get someone to show you a regular expression syntax.

data t;

  crv='IR_MX_GOVT_BFV_MXN_7Y';

  if index(crv,'_') then want = substr(crv,1,length(crv)-length(scan(crv,-1,'_'))-1);

  put (crv want) (=/);

run;

Frequent Contributor
Posts: 133

Re: how do I trim the piece to the right of the last underscore?

Thanks so much, Tom.

Do I need to use trim(left(var)) to trim off the blanks?

Also, why do you put "if index(crv,'_') then" as the condition logic?

Super User
Super User
Posts: 6,500

Re: how do I trim the piece to the right of the last underscore?

TRIM() is meaningless in this situation as LENGTH() and SAS assignment to character variables will ignore and append spaces, respectively.

I added the IF because I was worried about what it would do when there were no underscores in the value.  Try it and see what happens.

Respected Advisor
Posts: 3,124

Re: how do I trim the piece to the right of the last underscore?

Another approach is to "call scan":

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

call scan(crv,-1,_p,_l,'_');

want=substr(crv,1,_p-2);

put (crv want) (=/);

drop _:;

run;

Haikuo

Super Contributor
Posts: 578

Re: how do I trim the piece to the right of the last underscore?

I'm not at my computer w/ sas to work out the syntax, but assuming the number of "-"s is variable, then you can use the reverse function, find the first "_", and take everything after it...then reverse it again.

Respected Advisor
Posts: 3,777

Re: how do I trim the piece to the right of the last underscore?

DBailey wrote:

I'm not at my computer w/ sas to work out the syntax, but assuming the number of "-"s is variable, then you can use the reverse function, find the first "_", and take everything after it...then reverse it again.

Instead of fiddling with reverse just search from the right.  My question is what should be assigned when there are no underscores.

data _null_;
  
do crv='IR_MX_GOVT_BFV_MXN_7Y','IRMXGOVTBFVXN7Y';
      want2= substrN(crv,1,find(crv,'_',-length(crv))-1);
      put (crv wantSmiley Happy (=/);
      call missing(of wantSmiley Happy;
      end;
  
run;
Respected Advisor
Posts: 4,649

Re: how do I trim the piece to the right of the last underscore?

You can also kill a fly with a bazooka :

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

want = prxchange("s/(.*)_[^_]*$/\1/", 1, crv);

put (crv want) (=/);

run;

It also works as required when there is no underscore. - PG

PG
Respected Advisor
Posts: 3,777

Re: how do I trim the piece to the right of the last underscore?

It also works as required when there is no underscore. - PG

How do you know?

Respected Advisor
Posts: 4,649

Re: how do I trim the piece to the right of the last underscore?

From first principles. I also tested...

14   data t;
15   crv='IR_MX_GOVT_BFV_MXN_7Y';
16   want = prxchange("s/(.*)_[^_]*$/\1/",1,crv);
17   put (crv want) (=/);
18   crv='IRMXGOVTBFVMXN7Y';
19   want = prxchange("s/(.*)_[^_]*$/\1/",1,crv);
20   put (crv want) (=/);
21   run;


crv=IR_MX_GOVT_BFV_MXN_7Y
want=IR_MX_GOVT_BFV_MXN

crv=IRMXGOVTBFVMXN7Y
want=IRMXGOVTBFVMXN7Y

... looks ok to me.

PG

PG
Respected Advisor
Posts: 3,777

Re: how do I trim the piece to the right of the last underscore?

There was never a mention of what to assign when there is no underscore in the string.  That's your assumption.

Super User
Super User
Posts: 6,500

Re: how do I trim the piece to the right of the last underscore?

Because it uses a change (substitute) command when the pattern is not matched nothing is changed.

Respected Advisor
Posts: 4,649

Re: how do I trim the piece to the right of the last underscore?

Indeed, my assumption was : no underscore -> nothing to trim. - PG

PG
Respected Advisor
Posts: 3,892

Re: how do I trim the piece to the right of the last underscore?

If you just want to replace a substring with a NULL string then why not formulate the RegEx accordingly?

data test;
  infile datalines truncover;
  input in_string:$30.;
  length out_string $30.;
  out_string=prxchange('s/_+[^_]*$//o',1,in_string);
  datalines;
IR_MX_GOVT_BFV_MXN_7Y
IR_MX_GOVT_BFV_MXN____7Y
IRMXGOVTBFVMXN7Y
IR_MX_GOVTBFVMXN_7Y
IR_MX_GOVTBFVMXN_

run;

Super Contributor
Posts: 376

Re: how do I trim the piece to the right of the last underscore?

ZRick wrote:

I just want: "IR_MX_GOVT_BFV_MXN" , or trim off the piece to the right of the last underscore, including the underscore.

How do I do that?

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

run;

Another regular expression solution:

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

* crv='IRMXGOVTBFVMXN7Y';

new1a=prxchange("s/^(.*)_(.*)$/\1/o",-1,crv);

new1b=prxchange("s/^(.*)_(.*)$/\2/o",-1,crv);

new2a=prxchange("s/^(.*?)_(.*)$/\1/o",-1,crv);

new2b=prxchange("s/^(.*?)_(.*)$/\2/o",-1,crv);

run;

I'll explain the regular expression:

s = "substitute"

/ / / = delimiter for "from", "to"

^ = beginning of text

( ) = capture buffer, numbered in order.  So, there are two capture buffers.

. = any character

* = zero or more occurrences.

$ = end of text.  Note: the padding of SAS variables with spaces are significant.  With a different regular expression, you might need "trim(crv)"

\1 = the text captured in the first capture buffer

o = compile the regular expression once, and retain the regex across data step iterations.  Irrelevant in this example, but often used to increase performance.

-1 = apply the substitution to all occurrences of the regular expression in the source text (doesn't apply to this regex, since it encompasses the entire string).

Ok, so in words, the first regular expression says "from the beginning of the text, find all characters ***, an underscore, then all characters until the end of the text, and substitute the text captured in the first capture buffer".

*** A key concept here is "greedy" vs. "non-greedy" matching (Google "regular expressions greedy matching").  By default, the matching is "greedy".  So, (.*) matches as many occurrences as it can, while still satisfying the rest of the regular expression _(.*)$  (underscore, any characters, end of line).

In the second regular expression, the ? metacharacter means "non-greedy" matching.  So, (.*?) means match the minimum occurrences, while still satisfying the rest of the regular expression (which is a "greedy" match).

I've saved the results of capture buffer #2 so you can see the results of the greedy vs. non-greedy matching.

Uncomment the second crv setting so show that crv will be returned unchanged if it doesn't match the regular expression.

Hope this helps,

Scott

Ask a Question
Discussion stats
  • 15 replies
  • 660 views
  • 1 like
  • 8 in conversation