I'm trying to flag 'codes' to a broad categories of:
- all characters
- all numeric
- any alphanumeric
I find categorizing codes for more generic conditions harder than specific conditions met shown below.
Any help please? What am I doing wrong? Thanks in advance.
data have;
input codes $;
datalines;
0D7Q8ZZ
XHRPXL2
0090T99
0090THJ
123456788
23456
234
0090T
0987F
HYDHDJH
;
/*Help is appreciated to work below codes work out*/
data x; set have;
if prxMatch("/^D+\s*$/o",codes) then flag = "all character";
if prxMatch("/^[a-z]+\s*$/o",codes) then flag = "all character";
if prxMatch("/^[a-z]*\s*$/o",codes) then flag = "all character";
if prxMatch("/^w*\s*$/o",codes) then flag = "any alphanumeric";
if prxMatch("/^d+\s*$/o",codes) then flag = "all_numeric";
run;
proc freq data=x;
tables flag;
run;
/*codes worked fine*/
if prxMatch("/^\d{5}.{2}\s*$/o",codes) then flag="CPT1";
else if prxMatch("/^\d{4}F.{2}\s*$/o",codes) then flag = "CPT2";
else if prxMatch("/^\d{4}T.{2}\s*$/o",codes) then flag = "CPT3";
else if prxMatch("/^V\d+\s*$/o",codes) then flag="VCODE";
else if prxMatch("/^E\d+\s*$/o",codes) then flag="ECODE";
else if prxMatch("/^\d{1}\s*$/o",codes) or
prxMatch("/^\d{2}\s*$/o",codes) or
prxMatch("/^\d{3}\s*$/o",codes) or
prxMatch("/^\d{4}\s*$/o",codes) then flag = "ICD9";
something like below should work
data have;
input codes $;
datalines;
0D7Q8ZZ
XHRPXL2
0090T99
0090THJ
123456788
23456
234
0090T
0987F
HYDHDJH
;
data x; set have;
length flag $30.;
if prxMatch("/^[a-z]+$/i",trim(codes)) then flag = "all character";
else if prxMatch("/^[0-9]+$/o",trim(codes)) then flag = "all numeric";
else if prxMatch("/^[a-z0-9]+$/i",trim(codes)) then flag = "any alphanumeric";
run;
something like below should work
data have;
input codes $;
datalines;
0D7Q8ZZ
XHRPXL2
0090T99
0090THJ
123456788
23456
234
0090T
0987F
HYDHDJH
;
data x; set have;
length flag $30.;
if prxMatch("/^[a-z]+$/i",trim(codes)) then flag = "all character";
else if prxMatch("/^[0-9]+$/o",trim(codes)) then flag = "all numeric";
else if prxMatch("/^[a-z0-9]+$/i",trim(codes)) then flag = "any alphanumeric";
run;
just to remove leading and trailing blanks
thanks @kiranv_
Do you know or @Rick_SAS if I can vectorize this with IML?
If nested in a loop, it works, otherwise no.
PROC IML;
USE HAVE;
READ ALL VAR _ALL_ INTO X [COLNAME=VARNAMES];
CLOSE;
/*doesn't work*/
FLAG= prxMatch("m/^[a-z]+$/imx",trim(X[,"codes"]));
PRINT X FLAG;
/*works*/
FLAG=J(NROW(X),1,.);
DO I=1 TO NROW(X);
FLAG[I] = prxMatch("m/^[a-z]+$/si",trim(X[I,"codes"]));
END;
PRINT X FLAG;
In IML, all character vectors are a fixed length, which is the length of the longest element. Thus when you execute
T = trim(X);
the TRIM function removes blanks from each element of X but the assignment operator essentially packs shorter elements with blanks so that every element of T has the same number of characters.
In other words, for vectors, the TRIM function is not doing what you think it is, but it does for scalars.
Anyway, you can tell the regular expression to ignore white space (or ignore at the end), which is probably what I'd suggest:
FLAG= prxMatch("m/^[a-z]+\s*$/imx", X);
PRINT FLAG X;
Another option would be to replace the regular expression with the ANY* functions in SAS. The general idea is
hasDigit = anydigit(X);
hasPunct = anypunct(X);
hasAlpha = anyalpha(X);
allChar = hasAlpha & ^hasDigit;
allNum = hasDigit & ^hasAlpha;
PRINT allChar allNum X;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.