I just want: "IR_MX_GOVT_BFV_MXN" , or trim off the piece to the right of the last underscore, including the underscore.
How do I do that?
data t;
crv='IR_MX_GOVT_BFV_MXN_7Y';
run;
I usually use SAS functions like SCAN and SUBSTR, but you can probably get someone to show you a regular expression syntax.
data t;
crv='IR_MX_GOVT_BFV_MXN_7Y';
if index(crv,'_') then want = substr(crv,1,length(crv)-length(scan(crv,-1,'_'))-1);
put (crv want) (=/);
run;
Thanks so much, Tom.
Do I need to use trim(left(var)) to trim off the blanks?
Also, why do you put "if index(crv,'_') then" as the condition logic?
TRIM() is meaningless in this situation as LENGTH() and SAS assignment to character variables will ignore and append spaces, respectively.
I added the IF because I was worried about what it would do when there were no underscores in the value. Try it and see what happens.
Another approach is to "call scan":
data t;
crv='IR_MX_GOVT_BFV_MXN_7Y';
call scan(crv,-1,_p,_l,'_');
want=substr(crv,1,_p-2);
put (crv want) (=/);
drop _:;
run;
Haikuo
I'm not at my computer w/ sas to work out the syntax, but assuming the number of "-"s is variable, then you can use the reverse function, find the first "_", and take everything after it...then reverse it again.
DBailey wrote:
I'm not at my computer w/ sas to work out the syntax, but assuming the number of "-"s is variable, then you can use the reverse function, find the first "_", and take everything after it...then reverse it again.
Instead of fiddling with reverse just search from the right. My question is what should be assigned when there are no underscores.
You can also kill a fly with a bazooka :
data t;
crv='IR_MX_GOVT_BFV_MXN_7Y';
want = prxchange("s/(.*)_[^_]*$/\1/", 1, crv);
put (crv want) (=/);
run;
It also works as required when there is no underscore. - PG
It also works as required when there is no underscore. - PG
How do you know?
From first principles. I also tested...
14 data t;
15 crv='IR_MX_GOVT_BFV_MXN_7Y';
16 want = prxchange("s/(.*)_[^_]*$/\1/",1,crv);
17 put (crv want) (=/);
18 crv='IRMXGOVTBFVMXN7Y';
19 want = prxchange("s/(.*)_[^_]*$/\1/",1,crv);
20 put (crv want) (=/);
21 run;
crv=IR_MX_GOVT_BFV_MXN_7Y
want=IR_MX_GOVT_BFV_MXN
crv=IRMXGOVTBFVMXN7Y
want=IRMXGOVTBFVMXN7Y
... looks ok to me.
PG
There was never a mention of what to assign when there is no underscore in the string. That's your assumption.
Because it uses a change (substitute) command when the pattern is not matched nothing is changed.
Indeed, my assumption was : no underscore -> nothing to trim. - PG
If you just want to replace a substring with a NULL string then why not formulate the RegEx accordingly?
data test;
infile datalines truncover;
input in_string:$30.;
length out_string $30.;
out_string=prxchange('s/_+[^_]*$//o',1,in_string);
datalines;
IR_MX_GOVT_BFV_MXN_7Y
IR_MX_GOVT_BFV_MXN____7Y
IRMXGOVTBFVMXN7Y
IR_MX_GOVTBFVMXN_7Y
IR_MX_GOVTBFVMXN_
;
run;
ZRick wrote:
I just want: "IR_MX_GOVT_BFV_MXN" , or trim off the piece to the right of the last underscore, including the underscore.
How do I do that?
data t;
crv='IR_MX_GOVT_BFV_MXN_7Y';
run;
Another regular expression solution:
data t;
crv='IR_MX_GOVT_BFV_MXN_7Y';
* crv='IRMXGOVTBFVMXN7Y';
new1a=prxchange("s/^(.*)_(.*)$/\1/o",-1,crv);
new1b=prxchange("s/^(.*)_(.*)$/\2/o",-1,crv);
new2a=prxchange("s/^(.*?)_(.*)$/\1/o",-1,crv);
new2b=prxchange("s/^(.*?)_(.*)$/\2/o",-1,crv);
run;
I'll explain the regular expression:
s = "substitute"
/ / / = delimiter for "from", "to"
^ = beginning of text
( ) = capture buffer, numbered in order. So, there are two capture buffers.
. = any character
* = zero or more occurrences.
$ = end of text. Note: the padding of SAS variables with spaces are significant. With a different regular expression, you might need "trim(crv)"
\1 = the text captured in the first capture buffer
o = compile the regular expression once, and retain the regex across data step iterations. Irrelevant in this example, but often used to increase performance.
-1 = apply the substitution to all occurrences of the regular expression in the source text (doesn't apply to this regex, since it encompasses the entire string).
Ok, so in words, the first regular expression says "from the beginning of the text, find all characters ***, an underscore, then all characters until the end of the text, and substitute the text captured in the first capture buffer".
*** A key concept here is "greedy" vs. "non-greedy" matching (Google "regular expressions greedy matching"). By default, the matching is "greedy". So, (.*) matches as many occurrences as it can, while still satisfying the rest of the regular expression _(.*)$ (underscore, any characters, end of line).
In the second regular expression, the ? metacharacter means "non-greedy" matching. So, (.*?) means match the minimum occurrences, while still satisfying the rest of the regular expression (which is a "greedy" match).
I've saved the results of capture buffer #2 so you can see the results of the greedy vs. non-greedy matching.
Uncomment the second crv setting so show that crv will be returned unchanged if it doesn't match the regular expression.
Hope this helps,
Scott
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.