Hi,
Is there a way to use proc format with regex so that it sequentially evaluates conditions in the order given?
Example:
proc format;
invalue $Fr
"s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_
"s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_
"s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_
"s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_
other = ''
;
quit;
data fruit;
length fruit $50. ;
infile cards dsd dlm='>' truncover;
input fruit ;
cards;
DEATHS
PELICA
PRIZEDE UNDEREDS/EMPRESSS
GRASSHOP
GRAMS
run;
data fruit2;
set fruit;
fruitset=input(fruit,$Fr.);
run;
The output is
But I want the output to be:
i.e. I want the conditions in proc format to be evaluated in the order given. The 3rd row matches the 1st regex condition (the word 'UNDEREDS')
, so the output should be PELICA, not EMPRESSS. I only want the output to be EMPRESSS if the conditions before are not met.
Any idea how I can achieve this? Or is there another way I can format strings based on patterns in a specific order of preference? Worst case I'll resort to if-then conditions, but I have a lot of columns to parse with a lot of conditions, and a huge dataset, so I'm trying to do it in a more modular/elegant way.
Thank you!
See if this helps:
proc format; invalue $Fr (notsorted) "s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_ "s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_ "s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_ "s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_ other = '' ; quit;
See if this helps:
proc format; invalue $Fr (notsorted) "s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_ "s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_ "s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_ "s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_ other = '' ; quit;
Hi,
absolutely. You could also define your formats as a chain with the keyword 'other' to control in detail what happens:
PROC FORMAT;
invalue $Fr1_ "s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_ other=[$Fr2_.];
invalue $Fr2_ "s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_ other=[$Fr3_.];
invalue $Fr3_ "s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_ other=[$Fr4_.];
invalue $Fr4_ "s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_ other=' ';
;
QUIT;
- Cheers -
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.