Hi,
Is there a way to use proc format with regex so that it sequentially evaluates conditions in the order given?
Example:
proc format;
invalue $Fr
"s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_
"s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_
"s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_
"s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_
other = ''
;
quit;
data fruit;
length fruit $50. ;
infile cards dsd dlm='>' truncover;
input fruit ;
cards;
DEATHS
PELICA
PRIZEDE UNDEREDS/EMPRESSS
GRASSHOP
GRAMS
run;
data fruit2;
set fruit;
fruitset=input(fruit,$Fr.);
run;
The output is
But I want the output to be:
i.e. I want the conditions in proc format to be evaluated in the order given. The 3rd row matches the 1st regex condition (the word 'UNDEREDS')
, so the output should be PELICA, not EMPRESSS. I only want the output to be EMPRESSS if the conditions before are not met.
Any idea how I can achieve this? Or is there another way I can format strings based on patterns in a specific order of preference? Worst case I'll resort to if-then conditions, but I have a lot of columns to parse with a lot of conditions, and a huge dataset, so I'm trying to do it in a more modular/elegant way.
Thank you!
See if this helps:
proc format; invalue $Fr (notsorted) "s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_ "s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_ "s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_ "s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_ other = '' ; quit;
See if this helps:
proc format; invalue $Fr (notsorted) "s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_ "s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_ "s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_ "s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_ other = '' ; quit;
Hi,
absolutely. You could also define your formats as a chain with the keyword 'other' to control in detail what happens:
PROC FORMAT;
invalue $Fr1_ "s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_ other=[$Fr2_.];
invalue $Fr2_ "s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_ other=[$Fr3_.];
invalue $Fr3_ "s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_ other=[$Fr4_.];
invalue $Fr4_ "s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_ other=' ';
;
QUIT;
- Cheers -
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.