The SAS Output Delivery System and reporting techniques

PROC FORMAT with REGEXP

Reply
N/A
Posts: 0

PROC FORMAT with REGEXP

Hi,

Is it possible (and how) to use regular expresions in PROC FORMAT procedure?

I have the following data

DATA test;
INPUT variable;
CARDS;
AAA
ABB
AC
BAA
BBB
BC
C
CCC
;
RUN;

and I would like to use something like like this
PROC FORMAT;
VALUE myformat
"A*"="USA"
"BA*"="Canada"
"BC*"="Mexico"
OTHER="rest of world";
RUN;

(If the string stars by "A", then "USA"; if starts by "BA" then Canada,...)

Is there any way haw to manage this?

Thank you in advance!
Jaroslav
Valued Guide
Posts: 2,174

Re: PROC FORMAT with REGEXP

"big birdie" had a project along the lines to use regex in user defined informats.
Lack of user interest/demand seems to have pushed it.
However, what your example demonstrates can be achieved without regex, by using ranges
PROC FORMAT;
VALUE myformat
"A" - "AZ" ="USA"
"BA" - 'BAZ' ="Canada"
"BC" - BCZ ="Mexico"
OTHER="rest of world";
here I used "Z" but if you are generating a cntlin= data set, you could use high values 'FF'x
PROC Star
Posts: 1,555

Re: PROC FORMAT with REGEXP

why regexp?

PROC FORMAT;
VALUE myformat
"A" - < "B"="USA"
"BA" - < "BC"="Canada"
"BC" - "BC["="Mexico"
OTHER="rest of world"; * [ comes after Z in the ascii sequence;
RUN;

Can't test here, but I think this should work.
N/A
Posts: 0

Re: PROC FORMAT with REGEXP

My motivation tu use regexp was to be able to define format like

PROC FORMAT;
VALUE myformat2
".D*"="Daily data"
".M*"="Monthly data"
".A*"="Anual data";
RUN;

(depending on the second position of the string, without using extra variable nor reading all dataset for posible values and using them to define the format).

Does anybody know how to solve this 2nd example?

Thank you very much
Valued Guide
Posts: 2,174

Re: PROC FORMAT with REGEXP

use substr() to start looking at a specific position.
OK, regex allows you to look for a pattern at a non-specific position.
N/A
Posts: 0

Re: PROC FORMAT with REGEXP

I wanted to apply the format over an existing dataset without creating new variable by substr... but it seems to be impossible...
Super Contributor
Super Contributor
Posts: 3,174

Re: PROC FORMAT with REGEXP

Associate your output format name to your existing SAS variable, using the SAS FORMAT statement?

Scott Barry
SBBWorks, Inc.
PROC Star
Posts: 1,555

Re: PROC FORMAT with REGEXP

I reckon it is possible.
2 ways you could still use the formatted value like you want without reading the dataset before hand:

1) bulldozer
=========
proc format ;
value $myformat
"AD" - "AD" 'ff'x='daily' /* as many entries as possible prefixes */
"BD" - "BD" 'ff'x='daily'
"CD" - "CD" 'ff'x='daily'
"DD" - "DD" 'ff'x='daily'
....
"AM" - "AM" 'ff'x='monthly'
"BM" - "BM" 'ff'x='monthly'
...
;


2) view
========
proc format ;
value $myformat
"D" - "D" 'ff'x='daily'
"M" - "M" 'ff'x='monthly'
;

data MYDATA_V/view=MYDATA_V;
set MYDATA;
MYVAR2=substr(MYVAR,2);
format MYVAR2 $x.;
proc print data=MYDATA_V(drop=MYVAR);
run;
Ask a Question
Discussion stats
  • 7 replies
  • 140 views
  • 0 likes
  • 4 in conversation