BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
devsas
Pyrite | Level 9

Hi guys,

Thx in advance for your help. So, I'm trying to find those records in a dataset which dont meet certain conditions for the variable HIC numbers. I can perhaps solve each condition using different functions, but iam hoping to have one code to get any cases which dont satisfy any of the criteria. Here are the conditions:

"If the HIC Number begins with a number, the first 9 positions must be numeric and
end with a letter, double letter or letter-integer combination. If the HIC
Number begins with a letter prefix (from 1-3 characters), the number itself has
either 6 or 9 digits"

For example, following values are valid: 777777888A, 777777888AB, 777777888A8, A888888888, AAA666666

Thanks so much!

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

I am not sure what happened on your end, but it seems working for me in the test data:

data have;

infile cards dsd ;

input str :$20. @@;

flag=prxmatch('/^\d{9,}\w*([a-z]{1,2}|[a-z]\d)$|^[a-z]{1,3}(\d{6}|\d{9})$/io', strip(str))>0;

cards;

777777888A, 777777888AB, 777777888A8, A888888888, AAA666666, adsklf2903475, 372175864

;

run;



The ones not meeting the criteria are flagged 0.

20150414.PNG

View solution in original post

10 REPLIES 10
Haikuo
Onyx | Level 15

Not rigorously tested:

data have;

infile cards dsd ;

input str :$20. @@;

flag=prxmatch('/^\d{9,}\w*([a-z]{1,2}$|[a-z]\d)|^[a-z]{1,3}(\d{6}|\d{9})$/io', strip(str))>0;

cards;

777777888A, 777777888AB, 777777888A8, A888888888, AAA666666, adsklf2903475

;

run;

quit;

devsas
Pyrite | Level 9

Thanks! If the dataset name is test and the variable where iam trying to test this condition is HICN, how will your code look like?

Haikuo
Onyx | Level 15

data want;

set test;

flag=prxmatch('/^\d{9,}\w*([a-z]{1,2}$|[a-z]\d)|^[a-z]{1,3}(\d{6}|\d{9})$/io', strip(HICN))>0;

run;

let us know how it goes, this code has not been thoroughly tested.

devsas
Pyrite | Level 9

Thanks again! Well, I ran it but it didnt flag '372175864' although it violates the condition '

If the HIC Number begins with a number, the first 9 positions must be numeric and

end with a letter' as the last character is not a letter.


Haikuo
Onyx | Level 15

I am not sure what happened on your end, but it seems working for me in the test data:

data have;

infile cards dsd ;

input str :$20. @@;

flag=prxmatch('/^\d{9,}\w*([a-z]{1,2}|[a-z]\d)$|^[a-z]{1,3}(\d{6}|\d{9})$/io', strip(str))>0;

cards;

777777888A, 777777888AB, 777777888A8, A888888888, AAA666666, adsklf2903475, 372175864

;

run;



The ones not meeting the criteria are flagged 0.

20150414.PNG

devsas
Pyrite | Level 9

Thank you sir, yes it works now. My bad, I was spelling variable incorrectly.  I see that you have used prxmatch function here, im not sure if i understand it correctly. Can you point towards any source where I can learn this?

Haikuo
Onyx | Level 15

Google "Perl Regular Expression SAS", at least that was how I started. You will hit many SAS papers as well as SAS online help Docs. Or like some of SAS users I know, they started with Perl Black book.

devsas
Pyrite | Level 9

Thank you again, Hai! I didnt know much about Perl before this discussion, but iam glad you answered it.  Because now i see that its such an efficient way to solve these kind of issues. I'm trying to learn after your post, but still struggling with some concepts. When you have little time, can you please break down in words the expression you used to flag those cases and also different symbols you used and their significance? Also, I have similar problems and was wondering if you can help in those too. Perhaps that will help me eventually getting the gist of these concepts. I looked at the cheat sheet of SAS/perl but still not fully getting it. Here are the other conditions. If you can break down these conditions individually that would be perfect. I mean flag1, flag2 etc

Admit Date
– The beginning date of the service must only be in the format m/d/yyyy. Must not be a
future date. The admit date must be prior to or equal to the discharge date.

 

Dx01 –  Must not be missing/Null and be between 3-5 characters (all digits or beginning with one letter).

  NPI – The NPI of the rendering or billing provider. Must not be missing/Null and be 10 digits.

  Bill Type – POSSIBLE Values ARE ‘P’ for professional, ‘I’ for inpatient or ‘O’ for outpatient. This
field must not be missing/Null.

Risk Assessment CodeMay be missing/Null for 2013 dates of
service
. If a value is present for 2013 dates of service, it should be
value ‘A’. It is required (must not be missing/Null) for 2014 dates of service
onward. Only possible values are A, B or C.

Thanks so much!

Haikuo
Onyx | Level 15

,

I will be frank with you. The key to PRX functions (and any other programming language) is practice, practice and more practice, there is no Genie to build a Rome for you overnight. I will try to break down part of my code in a hope to facilitate your understanding, but ultimately it sits upon your shoulder.

('/^\d{9,}\w*([a-z]{1,2}$|[a-z]\d)|^[a-z]{1,3}(\d{6}|\d{9})$/io', strip(str))

1. ' ': Usually PRX is searching for a text string, so if you are searching an expression (variable), you may not need this quote.

2. /      /io: the contents in between forward slash are the contents you are searching. io: i to let you ignore cases of alphabetic letters (A=a), o to let SAS only compile once for the same content to save resources.

3. ^: is the beginning of the your string. so ^123 will match 123, but not 0123.

4. \d: is digits (0-9).

5. \d{9,}: at least 9 digits.

6.\w*: zero or more word characters.

7. (     |     😞 is the selective grouping, either the part before pipe '|' or after will enter the search.

8. [a-z]: any one of alphabetic letters, and because of the o options explained in above 2, case is ignored.

9. [a-z]{1,2}: at least one letter, but no more than 2 letters.

10. $: the end of the string. So ABC$ will match OKABC, but not OKABCD.

11. [a-z]\d: any one of alphabetic letters plus any one digits.

The rest of the code is just repeating the same concepts while following your business requirements. During this process, it actually helped me to identify one glitch in my original code. I have it fixed in the original post, but not in this one. Challenge yourself to find it, and to figure out why.

Good luck,

Haikuo

devsas
Pyrite | Level 9

That's fair Haikuo. Thanks so much for explanation. You are right, I should try to figure out myself more as it will help.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1208 views
  • 3 likes
  • 2 in conversation