BookmarkSubscribeRSS Feed
starosto
Fluorite | Level 6

Hello,

 

I would like to receive two variables. One with TEXT1 and the other one with TEXT2.

Below code does not work.

Reason = strip(prxchange("s/(.*?)[:,](.*)/$1/", -1 , strip(Value)));
ReasonSp = strip(prxchange("s/(.*?)[:,](.*)/$2/", -1, strip(Value)));

 

Could you please help?

This is my Value variable:

0. TEXT1.

I. TEXT1, i.e. TEXT2.
II.a. TEXT1: TEXT2.
II.b. TEXT1: TEXT2.
III.a. TEXT1: TEXT2.
III.b. TEXT1: TEXT2.
III.c. TEXT1: TEXT2.
IV. TEXT1.
V.a. TEXT1: TEXT2.
V.b. TEXT1: TEXT2.

 

7 REPLIES 7
Oligolas
Barite | Level 11

do you expect this?

data have;
length Value $200;
Value="0. TEXT1.";output;
Value="I. TEXT1, i.e. TEXT2.";output;
Value="II.a. TEXT1: TEXT2.";output;
Value="II.b. TEXT1: TEXT2.";output;
Value="III.a. TEXT1: TEXT2.";output;
Value="III.b. TEXT1: TEXT2.";output;
Value="III.c. TEXT1: TEXT2.";output;
Value="IV. TEXT1.";output;
Value="V.a. TEXT1: TEXT2.";output;
Value="V.b. TEXT1: TEXT2.";output;
run;
data want;
   set have;
   length Reason ReasonSp $200;
   Reason   = strip(prxchange("s/([^:,]*?)[:,](.*)/$1/", -1 , strip(Value)));
   if prxmatch('/.*[:,].*/',Value) then ReasonSp = strip(prxchange("s/.*[:,](.+)/$1/", -1, strip(Value)));
run;
________________________

- Cheers -

Oligolas
Barite | Level 11

or maybe this:

data want;
   set have;
   length Reason ReasonSp $200;
   Reason   = strip(prxchange("s/^[0MDCLXVI]+[\.\w]+\s+([^:,]*).*/$1/", -1 , strip(Value)));
   if prxmatch('/.*[:,].*/',Value) then ReasonSp = strip(prxchange("s/.*[:,](.+)/$1/", -1, strip(Value)));
run;

You'll find more restrictive matching of roman numerals here

________________________

- Cheers -

starosto
Fluorite | Level 6

starosto_0-1688574801531.png

Almost ok, I do not want to have i.e. and also I do not wan to have dots at the end. Thanks! 🙂

s_lassen
Meteorite | Level 14

I would not use PRX for that, just SCAN:

data want;
   set have;
   reason=scan(scan(value,2,' '),1,':,.');
   if countw(value,' ')>2 then 
     ReasonSp=scan(scan(value,-1,' '),1,':,.');
run;

The reason the ReasonSp calculation is a bit more complicated is to account for the "i.e." in the second observation, otherwise it could be calculated by taking the third word - but instead we have go get the last word, but only if there are more than 2 words.

starosto
Fluorite | Level 6

Thank you! This is not universal - it depends on number of words. Under TEXT1 or TEXT2 I could have the whole sentence.

Oligolas
Barite | Level 11

It think the best way would be to perform the final cleaning in a second step.

 

But depending on

  1. how much data cleaning you need to finally get TEXT1 and TEXT2
  2. wether you want to check the conformity of the syntax or not
  3. your facility to re-understand, debug or refine regex you wrote a time ago

you may want to come up with a solution like this:


data want;
   set have;
   length Reason ReasonSp $200;
   *Roughly extract TEXT1 and TEXT2;
   Reason   = strip(substrn(scan(value,1,':,'),find(Value,' ')));
   ReasonSp = scan(value,2,':,');

   *Data cleaning;
   array checkNClean Reason ReasonSp;
   do over checkNClean;
      *Remove trailing dot;
      if substrn(reverse(strip(checkNClean)),1,1) eq '.' then checkNClean=reverse(substrn(reverse(strip(checkNClean)),2));
      *Remove leading i.e.;
      if substrn(strip(checkNClean),1,4) eq 'i.e.' then checkNClean=strip(substrn(strip(checkNClean),5));
   end;
run;
proc print;run;

 

________________________

- Cheers -

ballardw
Super User

@starosto wrote:

Thank you! This is not universal - it depends on number of words. Under TEXT1 or TEXT2 I could have the whole sentence.


GIGO

 

Incomplete problem description yields incomplete solutions.

 

Since C, D, L  and M are also "Roman Numerals" you may also need much much more description of your data and the rules involved.

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 353 views
  • 0 likes
  • 4 in conversation