ANYDIGIT function or NOTDIGIT function or PRX for my verification problem?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 137
Accepted Solution

ANYDIGIT function or NOTDIGIT function or PRX for my verification problem?

I have a unique identifier column that I am having it as a character variable in my input data set? Say for example,

ID

123456789

987654321

234567891 /*over 5 million records*/

I need to do a verification process and move the acceptable ones to a data set and rejected ones to exception. The rule is that 1. values must not have more than 9 digits 2. values cannot have any character/special character or blank between the digits. On my search I found anydigit, notdigit functions coming close to my needs. Any help please?

Thanks,

Charlotte


Accepted Solutions
Solution
‎06-20-2015 02:57 AM
Respected Advisor
Posts: 3,840

Re: ANYDIGIT function or NOTDIGIT function or PRX for my verification problem?

A regular expression lets you test in one go that there are only digits as well as that the number of digits is between 1 and 9. Below code with a tweak to what provided so that it covers your cases.

data have;

  infile datalines dsd dlm=',';

  input id:$10. valid_flg:$1.;

  if prxmatch('/^\s*\d{1,9}\s*$/',id)>0 then isValid=1;

  else isValid=0;

cards;

123456789,Y

987654321,Y

234567891,Y

23456,Y

2345678910,N

2345 7891,N

2345a7891,N

234567891a,N

;

run;

View solution in original post


All Replies
Super User
Posts: 1,117

Re: ANYDIGIT function or NOTDIGIT function or PRX for my verification problem?

Please try the PRX way

data have;

input id$10. ;

if prxmatch('/^\d{9}/',id)>0;

cards;

123456789

987654321

234567891

;


Thanks,

Jag


Thanks,
Jag
Super User
Posts: 1,117

Re: ANYDIGIT function or NOTDIGIT function or PRX for my verification problem?

Please check the difference between anydigit and prx, prx is more efficient comparatively

data have;

input id $10.;

if prxmatch('/^\d{9}/',id)>0 then flag=1;

if anydigit(id) then flag2=1;

cards;

123456789

98765xx4321

234567891

;

Thanks,

Jag

Thanks,
Jag
Solution
‎06-20-2015 02:57 AM
Respected Advisor
Posts: 3,840

Re: ANYDIGIT function or NOTDIGIT function or PRX for my verification problem?

A regular expression lets you test in one go that there are only digits as well as that the number of digits is between 1 and 9. Below code with a tweak to what provided so that it covers your cases.

data have;

  infile datalines dsd dlm=',';

  input id:$10. valid_flg:$1.;

  if prxmatch('/^\s*\d{1,9}\s*$/',id)>0 then isValid=1;

  else isValid=0;

cards;

123456789,Y

987654321,Y

234567891,Y

23456,Y

2345678910,N

2345 7891,N

2345a7891,N

234567891a,N

;

run;

Grand Advisor
Posts: 9,596

Re: ANYDIGIT function or NOTDIGIT function or PRX for my verification problem?

Charlotte,

You want Perl Regulation Expression or SAS Function ?

Code: Program

data have;
  infile datalines dsd dlm=',';
  input id:$10. valid_flg:$1.;
  if 0 lt lengthn(strip(id)) lt 10 and notdigit(strip(id))=0 then isValid=1;
  else isValid=0;
cards;
123456789,Y
987654321,Y
234567891,Y
23456,Y
2345678910,N
2345 7891,N
2345a7891,N
234567891a,N
;
run;

I suggest take Patrick's code which is more robust .

Xia Keshan

Message was edited by: xia keshan

Frequent Contributor
Posts: 137

Re: ANYDIGIT function or NOTDIGIT function or PRX for my verification problem?

Hi Xia, How are you? Thank you. Well, I am using SAS 9.3 and I was rather confused what would be right choice for the solution like you mentioned SAS function or Perl regular expression. Since the file has over 5 million records and taking 's suggestion "A regular expression lets you test in one go that there are only digits as well as that the number of digits is between 1 and 9"  sounds great too. I would like anything that performs magnificently Smiley Happy.What's your opinion?

I hope you are enjoying your weekend,

Charlotte

Frequent Contributor
Posts: 137

Re: ANYDIGIT function or NOTDIGIT function or PRX for my verification problem?

Ok Oops I didn;t see your message Edit "I suggest take Patrick's code which is more robust ".. Thanks Indeed!

Grand Advisor
Posts: 9,596

Re: ANYDIGIT function or NOTDIGIT function or PRX for my verification problem?

PRX is more robust as Patrick did on account of getting rid of other blank characters like TAB character . But SAS Function would be more fast , it only take care of white blanks , if your data is clean , this way is good to try .

Xia Keshan

Trusted Advisor
Posts: 1,300

Re: ANYDIGIT function or NOTDIGIT function or PRX for my verification problem?

In example from , the strip inside the lengthn function is redundant. Personally I would prefer to write either version (prx or not) without the IF-THEN/ELSE logic

isValid = (0<lengthn(id)<10 & ^notdigit(strip(id)));

isValid = (prxmatch('/^\d{1,9}\s*$/',id)>0);

prxmatch, by virtue of it's implementation should be less efficient (performance-wise).

You can check the performance by running the following and changing the commented line:

data _null_;

  infile datalines dsd dlm=',';

  input id:$10. valid_flg:$1.;

  do i=1 to 1000000;

  isValid = (0<lengthn(id)<10 & ^notdigit(strip(id)));

  *isValid = (prxmatch('/^\d{1,9}\s*$/o',id)>0);

  end;

cards;

123456789,Y

987654321,Y

234567891,Y

23456,Y

2345678910,N

2345 7891,N

2345a7891,N

234567891a,N

;

run;

LENGTHN+NOTDIGIT function version takes 0.2 seconds on my machine

PRX function version takes 2.46 seconds

Frequent Contributor
Posts: 137

Re: ANYDIGIT function or NOTDIGIT function or PRX for my verification problem?

Hi Matt, Good morning, Assuming it is morning there right now in US Smiley Happy  while it's almost noon here in Liverpool, England. Sorry for the late acknowledgement and Thank you!. I like the very interesting performance comparison 0.2 seconds as opposed to 2.46 is quite significant. Hmm something to think about. Have a nice day.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 1047 views
  • 8 likes
  • 5 in conversation