Hi,
Is there a way to use proc format with regex so that it sequentially evaluates conditions in the order given?
Example:
proc format;
invalue $Fr
"s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_
"s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_
"s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_
"s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_
other = ''
;
quit;
data fruit;
length fruit $50. ;
infile cards dsd dlm='>' truncover;
input fruit ;
cards;
DEATHS
PELICA
PRIZEDE UNDEREDS/EMPRESSS
GRASSHOP
GRAMS
run;
data fruit2;
set fruit;
fruitset=input(fruit,$Fr.);
run;
The output is
But I want the output to be:
i.e. I want the conditions in proc format to be evaluated in the order given. The 3rd row matches the 1st regex condition (the word 'UNDEREDS')
, so the output should be PELICA, not EMPRESSS. I only want the output to be EMPRESSS if the conditions before are not met.
Any idea how I can achieve this? Or is there another way I can format strings based on patterns in a specific order of preference? Worst case I'll resort to if-then conditions, but I have a lot of columns to parse with a lot of conditions, and a huge dataset, so I'm trying to do it in a more modular/elegant way.
Thank you!
See if this helps:
proc format; invalue $Fr (notsorted) "s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_ "s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_ "s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_ "s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_ other = '' ; quit;
See if this helps:
proc format; invalue $Fr (notsorted) "s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_ "s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_ "s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_ "s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_ other = '' ; quit;
Hi,
absolutely. You could also define your formats as a chain with the keyword 'other' to control in detail what happens:
PROC FORMAT;
invalue $Fr1_ "s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_ other=[$Fr2_.];
invalue $Fr2_ "s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_ other=[$Fr3_.];
invalue $Fr3_ "s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_ other=[$Fr4_.];
invalue $Fr4_ "s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_ other=' ';
;
QUIT;
- Cheers -
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.