BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi,

Is it possible (and how) to use regular expresions in PROC FORMAT procedure?

I have the following data

DATA test;
INPUT variable;
CARDS;
AAA
ABB
AC
BAA
BBB
BC
C
CCC
;
RUN;

and I would like to use something like like this
PROC FORMAT;
VALUE myformat
"A*"="USA"
"BA*"="Canada"
"BC*"="Mexico"
OTHER="rest of world";
RUN;

(If the string stars by "A", then "USA"; if starts by "BA" then Canada,...)

Is there any way haw to manage this?

Thank you in advance!
Jaroslav
7 REPLIES 7
Peter_C
Rhodochrosite | Level 12
"big birdie" had a project along the lines to use regex in user defined informats.
Lack of user interest/demand seems to have pushed it.
However, what your example demonstrates can be achieved without regex, by using ranges
PROC FORMAT;
VALUE myformat
"A" - "AZ" ="USA"
"BA" - 'BAZ' ="Canada"
"BC" - BCZ ="Mexico"
OTHER="rest of world";
here I used "Z" but if you are generating a cntlin= data set, you could use high values 'FF'x
ChrisNZ
Tourmaline | Level 20
why regexp?

PROC FORMAT;
VALUE myformat
"A" - < "B"="USA"
"BA" - < "BC"="Canada"
"BC" - "BC["="Mexico"
OTHER="rest of world"; * [ comes after Z in the ascii sequence;
RUN;

Can't test here, but I think this should work.
deleted_user
Not applicable
My motivation tu use regexp was to be able to define format like

PROC FORMAT;
VALUE myformat2
".D*"="Daily data"
".M*"="Monthly data"
".A*"="Anual data";
RUN;

(depending on the second position of the string, without using extra variable nor reading all dataset for posible values and using them to define the format).

Does anybody know how to solve this 2nd example?

Thank you very much
Peter_C
Rhodochrosite | Level 12
use substr() to start looking at a specific position.
OK, regex allows you to look for a pattern at a non-specific position.
deleted_user
Not applicable
I wanted to apply the format over an existing dataset without creating new variable by substr... but it seems to be impossible...
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Associate your output format name to your existing SAS variable, using the SAS FORMAT statement?

Scott Barry
SBBWorks, Inc.
ChrisNZ
Tourmaline | Level 20
I reckon it is possible.
2 ways you could still use the formatted value like you want without reading the dataset before hand:

1) bulldozer
=========
proc format ;
value $myformat
"AD" - "AD" 'ff'x='daily' /* as many entries as possible prefixes */
"BD" - "BD" 'ff'x='daily'
"CD" - "CD" 'ff'x='daily'
"DD" - "DD" 'ff'x='daily'
....
"AM" - "AM" 'ff'x='monthly'
"BM" - "BM" 'ff'x='monthly'
...
;


2) view
========
proc format ;
value $myformat
"D" - "D" 'ff'x='daily'
"M" - "M" 'ff'x='monthly'
;

data MYDATA_V/view=MYDATA_V;
set MYDATA;
MYVAR2=substr(MYVAR,2);
format MYVAR2 $x.;
proc print data=MYDATA_V(drop=MYVAR);
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1025 views
  • 0 likes
  • 4 in conversation