BookmarkSubscribeRSS Feed
ZRick
Obsidian | Level 7

I just want: "IR_MX_GOVT_BFV_MXN" , or trim off the piece to the right of the last underscore, including the underscore.

How do I do that?

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

run;

15 REPLIES 15
Tom
Super User Tom
Super User

I usually use SAS functions like SCAN and SUBSTR, but you can probably get someone to show you a regular expression syntax.

data t;

  crv='IR_MX_GOVT_BFV_MXN_7Y';

  if index(crv,'_') then want = substr(crv,1,length(crv)-length(scan(crv,-1,'_'))-1);

  put (crv want) (=/);

run;

ZRick
Obsidian | Level 7

Thanks so much, Tom.

Do I need to use trim(left(var)) to trim off the blanks?

Also, why do you put "if index(crv,'_') then" as the condition logic?

Tom
Super User Tom
Super User

TRIM() is meaningless in this situation as LENGTH() and SAS assignment to character variables will ignore and append spaces, respectively.

I added the IF because I was worried about what it would do when there were no underscores in the value.  Try it and see what happens.

Haikuo
Onyx | Level 15

Another approach is to "call scan":

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

call scan(crv,-1,_p,_l,'_');

want=substr(crv,1,_p-2);

put (crv want) (=/);

drop _:;

run;

Haikuo

DBailey
Lapis Lazuli | Level 10

I'm not at my computer w/ sas to work out the syntax, but assuming the number of "-"s is variable, then you can use the reverse function, find the first "_", and take everything after it...then reverse it again.

data_null__
Jade | Level 19

DBailey wrote:

I'm not at my computer w/ sas to work out the syntax, but assuming the number of "-"s is variable, then you can use the reverse function, find the first "_", and take everything after it...then reverse it again.

Instead of fiddling with reverse just search from the right.  My question is what should be assigned when there are no underscores.

data _null_;
  
do crv='IR_MX_GOVT_BFV_MXN_7Y','IRMXGOVTBFVXN7Y';
      want2= substrN(crv,1,find(crv,'_',-length(crv))-1);
      put (crv want:) (=/);
      call missing(of want:);
      end;
  
run;
PGStats
Opal | Level 21

You can also kill a fly with a bazooka :

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

want = prxchange("s/(.*)_[^_]*$/\1/", 1, crv);

put (crv want) (=/);

run;

It also works as required when there is no underscore. - PG

PG
data_null__
Jade | Level 19

It also works as required when there is no underscore. - PG

How do you know?

PGStats
Opal | Level 21

From first principles. I also tested...

14   data t;
15   crv='IR_MX_GOVT_BFV_MXN_7Y';
16   want = prxchange("s/(.*)_[^_]*$/\1/",1,crv);
17   put (crv want) (=/);
18   crv='IRMXGOVTBFVMXN7Y';
19   want = prxchange("s/(.*)_[^_]*$/\1/",1,crv);
20   put (crv want) (=/);
21   run;


crv=IR_MX_GOVT_BFV_MXN_7Y
want=IR_MX_GOVT_BFV_MXN

crv=IRMXGOVTBFVMXN7Y
want=IRMXGOVTBFVMXN7Y

... looks ok to me.

PG

PG
data_null__
Jade | Level 19

There was never a mention of what to assign when there is no underscore in the string.  That's your assumption.

Tom
Super User Tom
Super User

Because it uses a change (substitute) command when the pattern is not matched nothing is changed.

PGStats
Opal | Level 21

Indeed, my assumption was : no underscore -> nothing to trim. - PG

PG
Patrick
Opal | Level 21

If you just want to replace a substring with a NULL string then why not formulate the RegEx accordingly?

data test;
  infile datalines truncover;
  input in_string:$30.;
  length out_string $30.;
  out_string=prxchange('s/_+[^_]*$//o',1,in_string);
  datalines;
IR_MX_GOVT_BFV_MXN_7Y
IR_MX_GOVT_BFV_MXN____7Y
IRMXGOVTBFVMXN7Y
IR_MX_GOVTBFVMXN_7Y
IR_MX_GOVTBFVMXN_

run;

ScottBass
Rhodochrosite | Level 12

ZRick wrote:

I just want: "IR_MX_GOVT_BFV_MXN" , or trim off the piece to the right of the last underscore, including the underscore.

How do I do that?

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

run;

Another regular expression solution:

data t;

crv='IR_MX_GOVT_BFV_MXN_7Y';

* crv='IRMXGOVTBFVMXN7Y';

new1a=prxchange("s/^(.*)_(.*)$/\1/o",-1,crv);

new1b=prxchange("s/^(.*)_(.*)$/\2/o",-1,crv);

new2a=prxchange("s/^(.*?)_(.*)$/\1/o",-1,crv);

new2b=prxchange("s/^(.*?)_(.*)$/\2/o",-1,crv);

run;

I'll explain the regular expression:

s = "substitute"

/ / / = delimiter for "from", "to"

^ = beginning of text

( ) = capture buffer, numbered in order.  So, there are two capture buffers.

. = any character

* = zero or more occurrences.

$ = end of text.  Note: the padding of SAS variables with spaces are significant.  With a different regular expression, you might need "trim(crv)"

\1 = the text captured in the first capture buffer

o = compile the regular expression once, and retain the regex across data step iterations.  Irrelevant in this example, but often used to increase performance.

-1 = apply the substitution to all occurrences of the regular expression in the source text (doesn't apply to this regex, since it encompasses the entire string).

Ok, so in words, the first regular expression says "from the beginning of the text, find all characters ***, an underscore, then all characters until the end of the text, and substitute the text captured in the first capture buffer".

*** A key concept here is "greedy" vs. "non-greedy" matching (Google "regular expressions greedy matching").  By default, the matching is "greedy".  So, (.*) matches as many occurrences as it can, while still satisfying the rest of the regular expression _(.*)$  (underscore, any characters, end of line).

In the second regular expression, the ? metacharacter means "non-greedy" matching.  So, (.*?) means match the minimum occurrences, while still satisfying the rest of the regular expression (which is a "greedy" match).

I've saved the results of capture buffer #2 so you can see the results of the greedy vs. non-greedy matching.

Uncomment the second crv setting so show that crv will be returned unchanged if it doesn't match the regular expression.

Hope this helps,

Scott


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 4018 views
  • 1 like
  • 8 in conversation