BookmarkSubscribeRSS Feed
starosto
Fluorite | Level 6

Hello,

 

I would like to receive two variables. One with TEXT1 and the other one with TEXT2.

Below code does not work.

Reason = strip(prxchange("s/(.*?)[:,](.*)/$1/", -1 , strip(Value)));
ReasonSp = strip(prxchange("s/(.*?)[:,](.*)/$2/", -1, strip(Value)));

 

Could you please help?

This is my Value variable:

0. TEXT1.

I. TEXT1, i.e. TEXT2.
II.a. TEXT1: TEXT2.
II.b. TEXT1: TEXT2.
III.a. TEXT1: TEXT2.
III.b. TEXT1: TEXT2.
III.c. TEXT1: TEXT2.
IV. TEXT1.
V.a. TEXT1: TEXT2.
V.b. TEXT1: TEXT2.

 

7 REPLIES 7
Oligolas
Barite | Level 11

do you expect this?

data have;
length Value $200;
Value="0. TEXT1.";output;
Value="I. TEXT1, i.e. TEXT2.";output;
Value="II.a. TEXT1: TEXT2.";output;
Value="II.b. TEXT1: TEXT2.";output;
Value="III.a. TEXT1: TEXT2.";output;
Value="III.b. TEXT1: TEXT2.";output;
Value="III.c. TEXT1: TEXT2.";output;
Value="IV. TEXT1.";output;
Value="V.a. TEXT1: TEXT2.";output;
Value="V.b. TEXT1: TEXT2.";output;
run;
data want;
   set have;
   length Reason ReasonSp $200;
   Reason   = strip(prxchange("s/([^:,]*?)[:,](.*)/$1/", -1 , strip(Value)));
   if prxmatch('/.*[:,].*/',Value) then ReasonSp = strip(prxchange("s/.*[:,](.+)/$1/", -1, strip(Value)));
run;
________________________

- Cheers -

Oligolas
Barite | Level 11

or maybe this:

data want;
   set have;
   length Reason ReasonSp $200;
   Reason   = strip(prxchange("s/^[0MDCLXVI]+[\.\w]+\s+([^:,]*).*/$1/", -1 , strip(Value)));
   if prxmatch('/.*[:,].*/',Value) then ReasonSp = strip(prxchange("s/.*[:,](.+)/$1/", -1, strip(Value)));
run;

You'll find more restrictive matching of roman numerals here

________________________

- Cheers -

starosto
Fluorite | Level 6

starosto_0-1688574801531.png

Almost ok, I do not want to have i.e. and also I do not wan to have dots at the end. Thanks! 🙂

s_lassen
Meteorite | Level 14

I would not use PRX for that, just SCAN:

data want;
   set have;
   reason=scan(scan(value,2,' '),1,':,.');
   if countw(value,' ')>2 then 
     ReasonSp=scan(scan(value,-1,' '),1,':,.');
run;

The reason the ReasonSp calculation is a bit more complicated is to account for the "i.e." in the second observation, otherwise it could be calculated by taking the third word - but instead we have go get the last word, but only if there are more than 2 words.

starosto
Fluorite | Level 6

Thank you! This is not universal - it depends on number of words. Under TEXT1 or TEXT2 I could have the whole sentence.

Oligolas
Barite | Level 11

It think the best way would be to perform the final cleaning in a second step.

 

But depending on

  1. how much data cleaning you need to finally get TEXT1 and TEXT2
  2. wether you want to check the conformity of the syntax or not
  3. your facility to re-understand, debug or refine regex you wrote a time ago

you may want to come up with a solution like this:


data want;
   set have;
   length Reason ReasonSp $200;
   *Roughly extract TEXT1 and TEXT2;
   Reason   = strip(substrn(scan(value,1,':,'),find(Value,' ')));
   ReasonSp = scan(value,2,':,');

   *Data cleaning;
   array checkNClean Reason ReasonSp;
   do over checkNClean;
      *Remove trailing dot;
      if substrn(reverse(strip(checkNClean)),1,1) eq '.' then checkNClean=reverse(substrn(reverse(strip(checkNClean)),2));
      *Remove leading i.e.;
      if substrn(strip(checkNClean),1,4) eq 'i.e.' then checkNClean=strip(substrn(strip(checkNClean),5));
   end;
run;
proc print;run;

 

________________________

- Cheers -

ballardw
Super User

@starosto wrote:

Thank you! This is not universal - it depends on number of words. Under TEXT1 or TEXT2 I could have the whole sentence.


GIGO

 

Incomplete problem description yields incomplete solutions.

 

Since C, D, L  and M are also "Roman Numerals" you may also need much much more description of your data and the rules involved.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 799 views
  • 0 likes
  • 4 in conversation