BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sc5
Calcite | Level 5 sc5
Calcite | Level 5

Hi,

 

Is there a way to use proc format with regex so that it sequentially evaluates conditions in the order given? 

Example: 

 

proc format;
	invalue $Fr
	"s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i"	(regexpe) = _same_
	"s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i"	(regexpe) = _same_
	"s/(.*)(GRA)(.*)/LLL/i"	(regexpe) = _same_
	"s/(.*)(EMPRE)(.*)/EMPRESSS/i"	(regexpe) = _same_
	other = ''
	;
quit;
data fruit;
	length fruit $50. ;
	infile cards dsd dlm='>' truncover;
	input fruit ;
	cards;
	DEATHS
	PELICA
	PRIZEDE UNDEREDS/EMPRESSS
	GRASSHOP
	GRAMS
run;
data fruit2;
	set fruit;
	fruitset=input(fruit,$Fr.);
run;

The output is 

sc5_0-1683097816287.png

But I want the output to be: 

sc5_1-1683097905691.png

i.e. I want the conditions in proc format to be evaluated in the order given. The 3rd row matches the 1st regex condition (the word 'UNDEREDS'), so the output should be PELICA, not EMPRESSS. I only want the output to be EMPRESSS if the conditions before are not met.

 

Any idea how I can achieve this? Or is there another way I can format strings based on patterns in a specific order of preference? Worst case I'll resort to if-then conditions, but I have a lot of columns to parse with a lot of conditions, and a huge dataset, so I'm trying to do it in a more modular/elegant way. 

 

Thank you!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

See if this helps:

proc format;
	invalue $Fr  (notsorted)
	"s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i"	(regexpe) = _same_
	"s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i"	(regexpe) = _same_
	"s/(.*)(GRA)(.*)/LLL/i"	(regexpe) = _same_
	"s/(.*)(EMPRE)(.*)/EMPRESSS/i"	(regexpe) = _same_
	other = ''
	;
quit;

View solution in original post

4 REPLIES 4
ballardw
Super User

See if this helps:

proc format;
	invalue $Fr  (notsorted)
	"s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i"	(regexpe) = _same_
	"s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i"	(regexpe) = _same_
	"s/(.*)(GRA)(.*)/LLL/i"	(regexpe) = _same_
	"s/(.*)(EMPRE)(.*)/EMPRESSS/i"	(regexpe) = _same_
	other = ''
	;
quit;
sc5
Calcite | Level 5 sc5
Calcite | Level 5
This seems to work, thanks a lot!
Oligolas
Barite | Level 11

Hi,

absolutely. You could also define your formats as a chain with the keyword 'other' to control in detail what happens:

PROC FORMAT;
	invalue $Fr1_ "s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i"	(regexpe) = _same_ other=[$Fr2_.];
	invalue $Fr2_ "s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i"	(regexpe) = _same_ other=[$Fr3_.];
	invalue $Fr3_ "s/(.*)(GRA)(.*)/LLL/i"	(regexpe) = _same_ other=[$Fr4_.];
	invalue $Fr4_ "s/(.*)(EMPRE)(.*)/EMPRESSS/i"	(regexpe) = _same_ other=' ';
	;
QUIT;

 

________________________

- Cheers -

sc5
Calcite | Level 5 sc5
Calcite | Level 5
Ooh ok I didn't know we could do that, thank you!

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 197 views
  • 1 like
  • 3 in conversation