BookmarkSubscribeRSS Feed
starosto
Fluorite | Level 6

Hello,

 

I would like to receive two variables. One with TEXT1 and the other one with TEXT2.

Below code does not work.

Reason = strip(prxchange("s/(.*?)[:,](.*)/$1/", -1 , strip(Value)));
ReasonSp = strip(prxchange("s/(.*?)[:,](.*)/$2/", -1, strip(Value)));

 

Could you please help?

This is my Value variable:

0. TEXT1.

I. TEXT1, i.e. TEXT2.
II.a. TEXT1: TEXT2.
II.b. TEXT1: TEXT2.
III.a. TEXT1: TEXT2.
III.b. TEXT1: TEXT2.
III.c. TEXT1: TEXT2.
IV. TEXT1.
V.a. TEXT1: TEXT2.
V.b. TEXT1: TEXT2.

 

7 REPLIES 7
Oligolas
Barite | Level 11

do you expect this?

data have;
length Value $200;
Value="0. TEXT1.";output;
Value="I. TEXT1, i.e. TEXT2.";output;
Value="II.a. TEXT1: TEXT2.";output;
Value="II.b. TEXT1: TEXT2.";output;
Value="III.a. TEXT1: TEXT2.";output;
Value="III.b. TEXT1: TEXT2.";output;
Value="III.c. TEXT1: TEXT2.";output;
Value="IV. TEXT1.";output;
Value="V.a. TEXT1: TEXT2.";output;
Value="V.b. TEXT1: TEXT2.";output;
run;
data want;
   set have;
   length Reason ReasonSp $200;
   Reason   = strip(prxchange("s/([^:,]*?)[:,](.*)/$1/", -1 , strip(Value)));
   if prxmatch('/.*[:,].*/',Value) then ReasonSp = strip(prxchange("s/.*[:,](.+)/$1/", -1, strip(Value)));
run;
________________________

- Cheers -

Oligolas
Barite | Level 11

or maybe this:

data want;
   set have;
   length Reason ReasonSp $200;
   Reason   = strip(prxchange("s/^[0MDCLXVI]+[\.\w]+\s+([^:,]*).*/$1/", -1 , strip(Value)));
   if prxmatch('/.*[:,].*/',Value) then ReasonSp = strip(prxchange("s/.*[:,](.+)/$1/", -1, strip(Value)));
run;

You'll find more restrictive matching of roman numerals here

________________________

- Cheers -

starosto
Fluorite | Level 6

starosto_0-1688574801531.png

Almost ok, I do not want to have i.e. and also I do not wan to have dots at the end. Thanks! 🙂

s_lassen
Meteorite | Level 14

I would not use PRX for that, just SCAN:

data want;
   set have;
   reason=scan(scan(value,2,' '),1,':,.');
   if countw(value,' ')>2 then 
     ReasonSp=scan(scan(value,-1,' '),1,':,.');
run;

The reason the ReasonSp calculation is a bit more complicated is to account for the "i.e." in the second observation, otherwise it could be calculated by taking the third word - but instead we have go get the last word, but only if there are more than 2 words.

starosto
Fluorite | Level 6

Thank you! This is not universal - it depends on number of words. Under TEXT1 or TEXT2 I could have the whole sentence.

Oligolas
Barite | Level 11

It think the best way would be to perform the final cleaning in a second step.

 

But depending on

  1. how much data cleaning you need to finally get TEXT1 and TEXT2
  2. wether you want to check the conformity of the syntax or not
  3. your facility to re-understand, debug or refine regex you wrote a time ago

you may want to come up with a solution like this:


data want;
   set have;
   length Reason ReasonSp $200;
   *Roughly extract TEXT1 and TEXT2;
   Reason   = strip(substrn(scan(value,1,':,'),find(Value,' ')));
   ReasonSp = scan(value,2,':,');

   *Data cleaning;
   array checkNClean Reason ReasonSp;
   do over checkNClean;
      *Remove trailing dot;
      if substrn(reverse(strip(checkNClean)),1,1) eq '.' then checkNClean=reverse(substrn(reverse(strip(checkNClean)),2));
      *Remove leading i.e.;
      if substrn(strip(checkNClean),1,4) eq 'i.e.' then checkNClean=strip(substrn(strip(checkNClean),5));
   end;
run;
proc print;run;

 

________________________

- Cheers -

ballardw
Super User

@starosto wrote:

Thank you! This is not universal - it depends on number of words. Under TEXT1 or TEXT2 I could have the whole sentence.


GIGO

 

Incomplete problem description yields incomplete solutions.

 

Since C, D, L  and M are also "Roman Numerals" you may also need much much more description of your data and the rules involved.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1160 views
  • 0 likes
  • 4 in conversation