Re: how do I trim the piece to the right of the last underscore?

ZRick · Posted 03-29-2013 08:54 PM

I just want: "IR_MX_GOVT_BFV_MXN" , or trim off the piece to the right of the last underscore, including the underscore.

How do I do that?

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

run;

Tom · Posted 03-29-2013 08:58 PM

I usually use SAS functions like SCAN and SUBSTR, but you can probably get someone to show you a regular expression syntax.

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

if index(crv,'_') then want = substr(crv,1,length(crv)-length(scan(crv,-1,'_'))-1);

put (crv want) (=/);

run;

ZRick · Posted 03-29-2013 09:05 PM

Thanks so much, Tom.

Do I need to use trim(left(var)) to trim off the blanks?

Also, why do you put "if index(crv,'_') then" as the condition logic?

Tom · Posted 03-29-2013 09:20 PM

TRIM() is meaningless in this situation as LENGTH() and SAS assignment to character variables will ignore and append spaces, respectively.

I added the IF because I was worried about what it would do when there were no underscores in the value. Try it and see what happens.

Haikuo · Posted 03-30-2013 10:34 AM

Another approach is to "call scan":

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

call scan(crv,-1,_p,_l,'_');

want=substr(crv,1,_p-2);

put (crv want) (=/);

drop _:;

run;

Haikuo

DBailey · Posted 03-30-2013 11:40 AM

I'm not at my computer w/ sas to work out the syntax, but assuming the number of "-"s is variable, then you can use the reverse function, find the first "_", and take everything after it...then reverse it again.

data_null__ · Posted 03-30-2013 11:48 AM

DBailey wrote:

I'm not at my computer w/ sas to work out the syntax, but assuming the number of "-"s is variable, then you can use the reverse function, find the first "_", and take everything after it...then reverse it again.

Instead of fiddling with reverse just search from the right. My question is what should be assigned when there are no underscores.

data _null_; 
   do crv='IR_MX_GOVT_BFV_MXN_7Y','IRMXGOVTBFVXN7Y'; 
      want2= substrN(crv,1,find(crv,'_',-length(crv))-1);
      put (crv want:) (=/);
      call missing(of want:);
      end; 
   run;

PGStats · Posted 03-30-2013 12:03 PM

You can also kill a fly with a bazooka :

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

want = prxchange("s/(.*)_[^_]*$/\1/", 1, crv);

put (crv want) (=/);

run;

It also works as required when there is no underscore. - PG

PG

data_null__ · Posted 03-30-2013 12:46 PM

It also works as required when there is no underscore. - PG

How do you know?

PGStats · Posted 03-30-2013 03:17 PM

From first principles. I also tested...

14   data t;
15   crv='IR_MX_GOVT_BFV_MXN_7Y';
16   want = prxchange("s/(.*)_[^_]*$/\1/",1,crv);
17   put (crv want) (=/);
18   crv='IRMXGOVTBFVMXN7Y';
19   want = prxchange("s/(.*)_[^_]*$/\1/",1,crv);
20   put (crv want) (=/);
21   run;

crv=IR_MX_GOVT_BFV_MXN_7Y
want=IR_MX_GOVT_BFV_MXN

crv=IRMXGOVTBFVMXN7Y
want=IRMXGOVTBFVMXN7Y

... looks ok to me.

PG

data_null__ · Posted 03-31-2013 09:57 AM

There was never a mention of what to assign when there is no underscore in the string. That's your assumption.

Tom · Posted 03-31-2013 10:20 AM

Because it uses a change (substitute) command when the pattern is not matched nothing is changed.

PGStats · Posted 03-31-2013 10:27 AM

Indeed, my assumption was : no underscore -> nothing to trim. - PG

PG

Patrick · Posted 03-31-2013 01:20 AM

If you just want to replace a substring with a NULL string then why not formulate the RegEx accordingly?

data test;
infile datalines truncover;
input in_string:$30.;
length out_string $30.;
out_string=prxchange('s/_+[^_]*$//o',1,in_string);
datalines;
IR_MX_GOVT_BFV_MXN_7Y
IR_MX_GOVT_BFV_MXN____7Y
IRMXGOVTBFVMXN7Y
IR_MX_GOVTBFVMXN_7Y
IR_MX_GOVTBFVMXN_
;
run;

ScottBass · Posted 03-30-2013 07:45 PM

ZRick wrote:

I just want: "IR_MX_GOVT_BFV_MXN" , or trim off the piece to the right of the last underscore, including the underscore.

How do I do that?

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

run;

Another regular expression solution:

data t;
crv='IR_MX_GOVT_BFV_MXN_7Y';
* crv='IRMXGOVTBFVMXN7Y';
new1a=prxchange("s/^(.*)_(.*)$/\1/o",-1,crv);
new1b=prxchange("s/^(.*)_(.*)$/\2/o",-1,crv);
new2a=prxchange("s/^(.*?)_(.*)$/\1/o",-1,crv);
new2b=prxchange("s/^(.*?)_(.*)$/\2/o",-1,crv);
run;

I'll explain the regular expression:

s = "substitute"

/ / / = delimiter for "from", "to"

^ = beginning of text

( ) = capture buffer, numbered in order. So, there are two capture buffers.

. = any character

* = zero or more occurrences.

$ = end of text. Note: the padding of SAS variables with spaces are significant. With a different regular expression, you might need "trim(crv)"

\1 = the text captured in the first capture buffer

o = compile the regular expression once, and retain the regex across data step iterations. Irrelevant in this example, but often used to increase performance.

-1 = apply the substitution to all occurrences of the regular expression in the source text (doesn't apply to this regex, since it encompasses the entire string).

Ok, so in words, the first regular expression says "from the beginning of the text, find all characters ***, an underscore, then all characters until the end of the text, and substitute the text captured in the first capture buffer".

*** A key concept here is "greedy" vs. "non-greedy" matching (Google "regular expressions greedy matching"). By default, the matching is "greedy". So, (.*) matches as many occurrences as it can, while still satisfying the rest of the regular expression _(.*)$ (underscore, any characters, end of line).

In the second regular expression, the ? metacharacter means "non-greedy" matching. So, (.*?) means match the minimum occurrences, while still satisfying the rest of the regular expression (which is a "greedy" match).

I've saved the results of capture buffer #2 so you can see the results of the greedy vs. non-greedy matching.

Uncomment the second crv setting so show that crv will be returned unchanged if it doesn't match the regular expression.

Hope this helps,

Scott

Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away