Hi Team,
I need a pattern of 1 character and 11 numeric values, but the challenge is the 1st character should be the first letter of name column postfix with 11 numeric values and sex='M'.
data test4;
input Name $ Sex $ ID $12.;
datalines;
Alice M A10294354554
Barbara M C02943545549
Carol F C02943545543
Henry M H02943545540
James M K02943545545
;
run;
In the above sample I need to validate the ID column with a flag value 'Valid' or 'Invalid'.
If the sex='M' and ID value contains the first character of Name column and 11 numeric values then it's a valid ID else it's an invalid ID.
I tried to take the first character of Name column into a macro variable and tried to write the program.
%macro test;
data want;
set test4;
%if Sex='M' %then %do;
proc sql;
select strip(substr(name,1,1)) into :First_Char from test4;
quit;
%end;
match=prxmatch("/^&First_Char\d{11}$/",ID);
%if match>1 %then %do;
flag='Invalid';
%end;
%else %do;
flag='Valid';
%end;
run;
%mend;
%test;
I am not getting any match, though 'Alice' and 'Henry' ID are valid, both IDs contains first character of name and 11 numbers beside sex='M' for both of them.
Please help me correct the logic.
@rajdeep wrote:
Thanks a lot for the quick reply.
I am just wondering what if the ID value length is not standard. Means, suppose one of the ID values has13 or 15 digit characters, but other values has 12 characters. Like below.
input Name $ Sex $ ID $13.;
datalines;
Alice M A102943545547
Barbara M C02943545549
Carol F C02943545543
Henry M H02943545540
James M K02943545545
;In the above sample you can see Alice has 13 char ID value, but Henry is having 12 char value with a trailing space, so in that case Henry's ID value is valid. will your code work in that case!
Thanks.
With the amended structure of the input dataset (defined length of ID is now 13) the condition length(id)=12 is no longer redundant and trailing blanks must be considered in the third argument of the FINDC function:
data want; set test5; flag=(sex='M' & id=:first(name) & length(id)=12 & ~findc(id,,'kdt',2)); run;
Hi @rajdeep,
There's no need for macro coding. Even regular expressions can be avoided:
data test5;
input Name $ Sex $ ID $12.;
datalines;
Alice M A10294354554
Barbara M C02943545549
Carol F C02943545543
Henry M H02943545540
James M K02943545545
Robert M R029435A5540
Timothy M T0294355540
;
data want;
set test5;
flag=(sex='M' & id=:first(name) & ~findc(id,,'kd',2));
run;
[Edit: Removed the redundant criterion length(id)=12.]
I leave it to you to create character values 'Valid' and 'Invalid' from the 1s and 0s produced by the code above if needed. Personally, I prefer these 0-1 flags because they are easier to handle and character labels can be assigned with a simple format:
proc format;
value valflag
0='Invalid'
1='Valid';
run;
proc print data=want;
format flag valflag.;
run;
Thanks a lot for the quick reply.
I am just wondering what if the ID value length is not standard. Means, suppose one of the ID values has13 or 15 digit characters, but other values has 12 characters. Like below.
input Name $ Sex $ ID $13.;
datalines;
Alice M A102943545547
Barbara M C02943545549
Carol F C02943545543
Henry M H02943545540
James M K02943545545
;
In the above sample you can see Alice has 13 char ID value, but Henry is having 12 char value with a trailing space, so in that case Henry's ID value is valid. will your code work in that case!
Thanks.
@rajdeep wrote:
Thanks a lot for the quick reply.
I am just wondering what if the ID value length is not standard. Means, suppose one of the ID values has13 or 15 digit characters, but other values has 12 characters. Like below.
input Name $ Sex $ ID $13.;
datalines;
Alice M A102943545547
Barbara M C02943545549
Carol F C02943545543
Henry M H02943545540
James M K02943545545
;In the above sample you can see Alice has 13 char ID value, but Henry is having 12 char value with a trailing space, so in that case Henry's ID value is valid. will your code work in that case!
Thanks.
With the amended structure of the input dataset (defined length of ID is now 13) the condition length(id)=12 is no longer redundant and trailing blanks must be considered in the third argument of the FINDC function:
data want; set test5; flag=(sex='M' & id=:first(name) & length(id)=12 & ~findc(id,,'kdt',2)); run;
Wow, this is something great.
Thanks a lot.
data want;
set test4;
if Sex='M' then flag=ifc(ifc(first(Name)eq first(ID) ,1,0) eq 1 and prxmatch("m/[a-zA-Z]\d{11}/oi",ID),'Valid','Invalid');
run;
Obs Name Sex ID flag 1 Alice M A10294354554 Valid 2 Barbara M C02943545549 Invalid 3 Carol F C02943545543 4 Henry M H02943545540 Valid 5 James M K02943545545 Invalid
Hi Team,
I am want to output specific records based on condition, but it seems it's not happening for some reason.
data test;
input CCI_STATE $2. CCI_ID_NUMBER $14. CCI_ID_TYPE $21.;
datalines;
CA CAID C6307718 DRIVERS LIC/STATE ID
CA N31790267 DRIVERS LIC/STATE ID
CA A6118782 DRIVERS LIC/STATE ID
CA B3765104 DRIVERS LIC/STATE ID
CA Y5652061 DRIVERS LIC/STATE ID
;
run;
data valid invalid;
set test;
if CCI_STATE='CA' and CCI_ID_Type='DRIVERS LIC/STATE ID' and prxmatch("/^(?<!\w)\w{1}\d{7}(?!\d)$/",compress(CCI_ID_Number, ,'s'))=0 then output invalid;
else output valid;
run;
so, basically out of 5 records I would like to output the (1 alpha+7numeric) ID numbers to valid dataset and remaining 2 records to invalid dataset.
While running the code it's only outputting the 'N31790267' ID to invalid, not the 'CAID C6307718' ID.
if I am checking the prxmatch result for the particular 'CAID C6307718' ID it's giving 0 only, but it's not outputting the same record to invalid dataset based on the if condition.
I would request please suggest something with regular expression facility.
Please help.
Thanks
Hi @rajdeep,
I would prefer a logic that requires certain criteria for a classification as "valid" and rejects everything else as "invalid."
In your example:
if ... and prxmatch('/^[A-Z]\d{7}$/',trim(CCI_ID_Number)) then output valid;
else output invalid;
To allow also lowercase letters (as in "a1234567") insert the i modifier after the closing slash of the regular expression. Of course, the IF conditions regarding CCI_STATE and CCI_ID_TYPE would need to be adapted if, e.g., data from other states were to be classified.
In the future, please open a new thread for a new question. This improves the chances of getting a quick answer because only very few people will notice a late post in an old thread.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.