Hello,
I would like to receive two variables. One with TEXT1 and the other one with TEXT2.
Below code does not work.
Reason = strip(prxchange("s/(.*?)[:,](.*)/$1/", -1 , strip(Value)));
ReasonSp = strip(prxchange("s/(.*?)[:,](.*)/$2/", -1, strip(Value)));
Could you please help?
This is my Value variable:
0. TEXT1.
I. TEXT1, i.e. TEXT2.
II.a. TEXT1: TEXT2.
II.b. TEXT1: TEXT2.
III.a. TEXT1: TEXT2.
III.b. TEXT1: TEXT2.
III.c. TEXT1: TEXT2.
IV. TEXT1.
V.a. TEXT1: TEXT2.
V.b. TEXT1: TEXT2.
do you expect this?
data have;
length Value $200;
Value="0. TEXT1.";output;
Value="I. TEXT1, i.e. TEXT2.";output;
Value="II.a. TEXT1: TEXT2.";output;
Value="II.b. TEXT1: TEXT2.";output;
Value="III.a. TEXT1: TEXT2.";output;
Value="III.b. TEXT1: TEXT2.";output;
Value="III.c. TEXT1: TEXT2.";output;
Value="IV. TEXT1.";output;
Value="V.a. TEXT1: TEXT2.";output;
Value="V.b. TEXT1: TEXT2.";output;
run;
data want;
set have;
length Reason ReasonSp $200;
Reason = strip(prxchange("s/([^:,]*?)[:,](.*)/$1/", -1 , strip(Value)));
if prxmatch('/.*[:,].*/',Value) then ReasonSp = strip(prxchange("s/.*[:,](.+)/$1/", -1, strip(Value)));
run;
- Cheers -
or maybe this:
data want;
set have;
length Reason ReasonSp $200;
Reason = strip(prxchange("s/^[0MDCLXVI]+[\.\w]+\s+([^:,]*).*/$1/", -1 , strip(Value)));
if prxmatch('/.*[:,].*/',Value) then ReasonSp = strip(prxchange("s/.*[:,](.+)/$1/", -1, strip(Value)));
run;
You'll find more restrictive matching of roman numerals here
- Cheers -
Almost ok, I do not want to have i.e. and also I do not wan to have dots at the end. Thanks! 🙂
I would not use PRX for that, just SCAN:
data want;
set have;
reason=scan(scan(value,2,' '),1,':,.');
if countw(value,' ')>2 then
ReasonSp=scan(scan(value,-1,' '),1,':,.');
run;
The reason the ReasonSp calculation is a bit more complicated is to account for the "i.e." in the second observation, otherwise it could be calculated by taking the third word - but instead we have go get the last word, but only if there are more than 2 words.
Thank you! This is not universal - it depends on number of words. Under TEXT1 or TEXT2 I could have the whole sentence.
It think the best way would be to perform the final cleaning in a second step.
But depending on
you may want to come up with a solution like this:
data want;
set have;
length Reason ReasonSp $200;
*Roughly extract TEXT1 and TEXT2;
Reason = strip(substrn(scan(value,1,':,'),find(Value,' ')));
ReasonSp = scan(value,2,':,');
*Data cleaning;
array checkNClean Reason ReasonSp;
do over checkNClean;
*Remove trailing dot;
if substrn(reverse(strip(checkNClean)),1,1) eq '.' then checkNClean=reverse(substrn(reverse(strip(checkNClean)),2));
*Remove leading i.e.;
if substrn(strip(checkNClean),1,4) eq 'i.e.' then checkNClean=strip(substrn(strip(checkNClean),5));
end;
run;
proc print;run;
- Cheers -
@starosto wrote:
Thank you! This is not universal - it depends on number of words. Under TEXT1 or TEXT2 I could have the whole sentence.
GIGO
Incomplete problem description yields incomplete solutions.
Since C, D, L and M are also "Roman Numerals" you may also need much much more description of your data and the rules involved.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.