Hello,
I would like to receive two variables. One with TEXT1 and the other one with TEXT2.
Below code does not work.
Reason = strip(prxchange("s/(.*?)[:,](.*)/$1/", -1 , strip(Value)));
ReasonSp = strip(prxchange("s/(.*?)[:,](.*)/$2/", -1, strip(Value)));
Could you please help?
This is my Value variable:
0. TEXT1.
I. TEXT1, i.e. TEXT2.
II.a. TEXT1: TEXT2.
II.b. TEXT1: TEXT2.
III.a. TEXT1: TEXT2.
III.b. TEXT1: TEXT2.
III.c. TEXT1: TEXT2.
IV. TEXT1.
V.a. TEXT1: TEXT2.
V.b. TEXT1: TEXT2.
do you expect this?
data have;
length Value $200;
Value="0. TEXT1.";output;
Value="I. TEXT1, i.e. TEXT2.";output;
Value="II.a. TEXT1: TEXT2.";output;
Value="II.b. TEXT1: TEXT2.";output;
Value="III.a. TEXT1: TEXT2.";output;
Value="III.b. TEXT1: TEXT2.";output;
Value="III.c. TEXT1: TEXT2.";output;
Value="IV. TEXT1.";output;
Value="V.a. TEXT1: TEXT2.";output;
Value="V.b. TEXT1: TEXT2.";output;
run;
data want;
set have;
length Reason ReasonSp $200;
Reason = strip(prxchange("s/([^:,]*?)[:,](.*)/$1/", -1 , strip(Value)));
if prxmatch('/.*[:,].*/',Value) then ReasonSp = strip(prxchange("s/.*[:,](.+)/$1/", -1, strip(Value)));
run;
- Cheers -
or maybe this:
data want;
set have;
length Reason ReasonSp $200;
Reason = strip(prxchange("s/^[0MDCLXVI]+[\.\w]+\s+([^:,]*).*/$1/", -1 , strip(Value)));
if prxmatch('/.*[:,].*/',Value) then ReasonSp = strip(prxchange("s/.*[:,](.+)/$1/", -1, strip(Value)));
run;
You'll find more restrictive matching of roman numerals here
- Cheers -
Almost ok, I do not want to have i.e. and also I do not wan to have dots at the end. Thanks! 🙂
I would not use PRX for that, just SCAN:
data want;
set have;
reason=scan(scan(value,2,' '),1,':,.');
if countw(value,' ')>2 then
ReasonSp=scan(scan(value,-1,' '),1,':,.');
run;
The reason the ReasonSp calculation is a bit more complicated is to account for the "i.e." in the second observation, otherwise it could be calculated by taking the third word - but instead we have go get the last word, but only if there are more than 2 words.
Thank you! This is not universal - it depends on number of words. Under TEXT1 or TEXT2 I could have the whole sentence.
It think the best way would be to perform the final cleaning in a second step.
But depending on
you may want to come up with a solution like this:
data want;
set have;
length Reason ReasonSp $200;
*Roughly extract TEXT1 and TEXT2;
Reason = strip(substrn(scan(value,1,':,'),find(Value,' ')));
ReasonSp = scan(value,2,':,');
*Data cleaning;
array checkNClean Reason ReasonSp;
do over checkNClean;
*Remove trailing dot;
if substrn(reverse(strip(checkNClean)),1,1) eq '.' then checkNClean=reverse(substrn(reverse(strip(checkNClean)),2));
*Remove leading i.e.;
if substrn(strip(checkNClean),1,4) eq 'i.e.' then checkNClean=strip(substrn(strip(checkNClean),5));
end;
run;
proc print;run;
- Cheers -
@starosto wrote:
Thank you! This is not universal - it depends on number of words. Under TEXT1 or TEXT2 I could have the whole sentence.
GIGO
Incomplete problem description yields incomplete solutions.
Since C, D, L and M are also "Roman Numerals" you may also need much much more description of your data and the rules involved.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.