BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sc5
Obsidian | Level 7 sc5
Obsidian | Level 7

Hi,

 

Is there a way to use proc format with regex so that it sequentially evaluates conditions in the order given? 

Example: 

 

proc format;
	invalue $Fr
	"s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i"	(regexpe) = _same_
	"s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i"	(regexpe) = _same_
	"s/(.*)(GRA)(.*)/LLL/i"	(regexpe) = _same_
	"s/(.*)(EMPRE)(.*)/EMPRESSS/i"	(regexpe) = _same_
	other = ''
	;
quit;
data fruit;
	length fruit $50. ;
	infile cards dsd dlm='>' truncover;
	input fruit ;
	cards;
	DEATHS
	PELICA
	PRIZEDE UNDEREDS/EMPRESSS
	GRASSHOP
	GRAMS
run;
data fruit2;
	set fruit;
	fruitset=input(fruit,$Fr.);
run;

The output is 

sc5_0-1683097816287.png

But I want the output to be: 

sc5_1-1683097905691.png

i.e. I want the conditions in proc format to be evaluated in the order given. The 3rd row matches the 1st regex condition (the word 'UNDEREDS'), so the output should be PELICA, not EMPRESSS. I only want the output to be EMPRESSS if the conditions before are not met.

 

Any idea how I can achieve this? Or is there another way I can format strings based on patterns in a specific order of preference? Worst case I'll resort to if-then conditions, but I have a lot of columns to parse with a lot of conditions, and a huge dataset, so I'm trying to do it in a more modular/elegant way. 

 

Thank you!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

See if this helps:

proc format;
	invalue $Fr  (notsorted)
	"s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i"	(regexpe) = _same_
	"s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i"	(regexpe) = _same_
	"s/(.*)(GRA)(.*)/LLL/i"	(regexpe) = _same_
	"s/(.*)(EMPRE)(.*)/EMPRESSS/i"	(regexpe) = _same_
	other = ''
	;
quit;

View solution in original post

4 REPLIES 4
ballardw
Super User

See if this helps:

proc format;
	invalue $Fr  (notsorted)
	"s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i"	(regexpe) = _same_
	"s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i"	(regexpe) = _same_
	"s/(.*)(GRA)(.*)/LLL/i"	(regexpe) = _same_
	"s/(.*)(EMPRE)(.*)/EMPRESSS/i"	(regexpe) = _same_
	other = ''
	;
quit;
sc5
Obsidian | Level 7 sc5
Obsidian | Level 7
This seems to work, thanks a lot!
Oligolas
Barite | Level 11

Hi,

absolutely. You could also define your formats as a chain with the keyword 'other' to control in detail what happens:

PROC FORMAT;
	invalue $Fr1_ "s/(.*)(PELICA|UNDEREDS|GRASSHOP)(.*)/PELICA/i"	(regexpe) = _same_ other=[$Fr2_.];
	invalue $Fr2_ "s/(.*)(DEATHS|DT|CID)(.*)/DEATHS/i"	(regexpe) = _same_ other=[$Fr3_.];
	invalue $Fr3_ "s/(.*)(GRA)(.*)/LLL/i"	(regexpe) = _same_ other=[$Fr4_.];
	invalue $Fr4_ "s/(.*)(EMPRE)(.*)/EMPRESSS/i"	(regexpe) = _same_ other=' ';
	;
QUIT;

 

________________________

- Cheers -

sc5
Obsidian | Level 7 sc5
Obsidian | Level 7
Ooh ok I didn't know we could do that, thank you!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 614 views
  • 1 like
  • 3 in conversation