PRXmatch to find if last digit was specific string?

Solved
Super Contributor
Posts: 331

PRXmatch to find if last digit was specific string?

[ Edited ]

I have a ‘procedure code’ column (3 million records). I need to clean the data by categorizing the ‘procedure codes’ column to three different groups as below and then delete the records otherwise.

• Category I codes are the five-digit numeric codes included in the main body of CPT.
• Category II Codes are alphanumeric and consist of four digits followed by the alpha character 'F’.
• Category III Codes are alphanumeric and consist of four digits followed by the alpha character 'T’.

I hope my code shown below had resolved for the Category 1. But not for category II and III.

Thanks,

Cruise

``````data have;
input codes \$ valid_flag \$;
datalines;
0D7Q8ZZ No
XHRPXL2 No
0BDN4ZZ No
123456788 No
23456 CAT1
234 No
0090T CAT3
0987F CAT2
HYDHDJH No
;

if lengthn(strip(codes)) gt 5 then CPT5=0; else CPT5=1;
if 99201<=codes<=99499 then cpt_code= 'management';
if 00100<=codes<=01999 then cpt_code= 'anasthesia';
if 10021<=codes<=69990 then cpt_code= 'surgery';
if 80047<=codes<=89398 then cpt_code= 'pathology_lab';
if 90281<=codes<=99607 then cpt_code= 'medicine';
if '0500F'<=codes<='0503F'  then cpt_code= 'cat2'; ``````

Accepted Solutions
Solution
2 weeks ago
Posts: 5,479

Re: PRXmatch to find if last digit was specific string?

The regular expressions would be:

``````data have;
input codes \$ valid_flag \$;
if prxMatch("/^\d{5}\s*\$/o",codes) then flag = "CAT1";
else if prxMatch("/^\d{4}F\s*\$/o",codes) then flag = "CAT2";
else if prxMatch("/^\d{4}T\s*\$/o",codes) then flag = "CAT3";
else flag = "No";
datalines;
0D7Q8ZZ No
XHRPXL2 No
0BDN4ZZ No
123456788 No
23456 CAT1
234 No
0090T CAT3
0987F CAT2
HYDHDJH No
;``````
PG

All Replies
Solution
2 weeks ago
Posts: 5,479

Re: PRXmatch to find if last digit was specific string?

The regular expressions would be:

``````data have;
input codes \$ valid_flag \$;
if prxMatch("/^\d{5}\s*\$/o",codes) then flag = "CAT1";
else if prxMatch("/^\d{4}F\s*\$/o",codes) then flag = "CAT2";
else if prxMatch("/^\d{4}T\s*\$/o",codes) then flag = "CAT3";
else flag = "No";
datalines;
0D7Q8ZZ No
XHRPXL2 No
0BDN4ZZ No
123456788 No
23456 CAT1
234 No
0090T CAT3
0987F CAT2
HYDHDJH No
;``````
PG
Super Contributor
Posts: 331

Re: PRXmatch to find if last digit was specific string?

I'm trying to add in other rules like:

else if prxMatch("/^\d{1}V\s*\$/o",codes) then flag = "VCODE"; /*if initialized with letter V followed with numbers then VCODE*/

else if prxMatch("/^\s*\$/o",codes) then flag = "ALLCHACR"; /*if codes have nothing but letters then ALLCHAR*/

where am i making mistakes in modifying your code above?

Valued Guide
Posts: 516

Re: PRXmatch to find if last digit was specific string?

[ Edited ]

I recommend reading the regex-documentation: http://support.sas.com/documentation/cdl/en/lefunctionsref/67398/HTML/default/viewer.htm#p0s9ilagexm...

Edit:

The regex for VCODE could be /^V\d+\s*\$/

and ALLCHARS is /^[a-z]+\s*\$/

Since all alphanumeric vars are filled with blanks, you have to add \s*.

Valued Guide
Posts: 516

Re: PRXmatch to find if last digit was specific string?

My solution uses the regex posted by @PGStats. Using formats could be less efficient, but if you have more expressions to check, testing them is easier and they can be used in other programs without duplicating them.

``````proc format;
invalue \$Cat1Test
"/^\d{5}\s*\$/" (regexp) = 'CAT1'
other = [\$Test2Cat.]
;
invalue \$Test2Cat
"/^\d{4}F\s*\$/" (regexp) = 'CAT2'
other = [\$Test3Cat.]
;
invalue \$Test3Cat
"/^\d{4}T\s*\$/" (regexp) = 'CAT3'
other = "No"
;
run;

data have;
length codes \$ 20 valid_flag check_flag \$ 4;
input codes \$ valid_flag \$;

check_flag = input(codes, \$Cat1Test.);

datalines;
0D7Q8ZZ No
XHRPXL2 No
0BDN4ZZ No
123456788 No
23456 CAT1
234 No
0090T CAT3
0987F CAT2
HYDHDJH No
;
run;``````
Super Contributor
Posts: 331

Re: PRXmatch to find if last digit was specific string?

[ Edited ]

Thanks. SAS documentation helped. And I got it for V and ECODES and all non-digits like "TYUIO".

data have;

input codes \$;

cards;

12345

123456

1234F

1234T

123

V1234

E1234

TYUIO

;

run;

data have1; set have;

length flag \$8;

if prxMatch("/^\d{5}\s*\$/o",codes) then flag = "CAT1";

else if prxMatch("/^\d{4}F\s*\$/o",codes) then flag = "CAT2";

else if prxMatch("/^\d{4}T\s*\$/o",codes) then flag = "CAT3";

else if prxMatch("/^V\d{4}\s*\$/o",codes) then flag = "VCODE";

else if prxMatch("/^E\d{4}\s*\$/o",codes) then flag = "ECODE";

else if prxMatch("/^\D{5}\s*\$/o",codes) then flag = "NON_DIG";

RUN;

☑ This topic is solved.