Hi,
Is there a way to use proc format with regex so that it sequentially evaluates conditions in the order given?
Example:
proc format;
invalue $Fr
"s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_
"s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_
"s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_
"s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_
other = ''
;
quit;
data fruit;
length fruit $50. ;
infile cards dsd dlm='>' truncover;
input fruit ;
cards;
DEATHS
PELICA
PRIZEDE UNDEREDS/EMPRESSS
GRASSHOP
GRAMS
run;
data fruit2;
set fruit;
fruitset=input(fruit,$Fr.);
run;
The output is
But I want the output to be:
i.e. I want the conditions in proc format to be evaluated in the order given. The 3rd row matches the 1st regex condition (the word 'UNDEREDS')
, so the output should be PELICA, not EMPRESSS. I only want the output to be EMPRESSS if the conditions before are not met.
Any idea how I can achieve this? Or is there another way I can format strings based on patterns in a specific order of preference? Worst case I'll resort to if-then conditions, but I have a lot of columns to parse with a lot of conditions, and a huge dataset, so I'm trying to do it in a more modular/elegant way.
Thank you!
See if this helps:
proc format; invalue $Fr (notsorted) "s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_ "s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_ "s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_ "s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_ other = '' ; quit;
See if this helps:
proc format; invalue $Fr (notsorted) "s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_ "s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_ "s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_ "s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_ other = '' ; quit;
Hi,
absolutely. You could also define your formats as a chain with the keyword 'other' to control in detail what happens:
PROC FORMAT;
invalue $Fr1_ "s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i" (regexpe) = _same_ other=[$Fr2_.];
invalue $Fr2_ "s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i" (regexpe) = _same_ other=[$Fr3_.];
invalue $Fr3_ "s/(.*)(GRA)(.*)/LLL/i" (regexpe) = _same_ other=[$Fr4_.];
invalue $Fr4_ "s/(.*)(EMPRE)(.*)/EMPRESSS/i" (regexpe) = _same_ other=' ';
;
QUIT;
- Cheers -
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.