Good Day
Please assist with a code to validate the first 6 digit of an SA ID Number(YYMMDD).
Thank you
I don't know what a "SA ID Number" is, so please post an excerpt of the data you have and what you except as result.
Hi @NOMBUH
Do you refer to the South African ID number?
If yes, as the first 6 digits (YYMMDD) are based on the date of birth, which level of validation do you want?
- exact date: do you have a dataset containing the date of birth of someone and its SA ID number?
- "plausible" date: do you want to test if YMMDD could be a date? (eg. MM -> 01 to 12), ...?
Please provide more details.
Hi @NOMBUH
A simple way to check for a valid date is to see if the string can be converted to a valid sas date with an input function.
Note ?? in the following example. It is a format modifier, so the input function does not treat an invalid date as an error, but returns a missing value.
code:
data have;
SA_ID = '3333331234'; output;
SA_ID = '1313131234'; output;
SA_ID = '1212121234'; output;
SA_ID = '1901271234'; output;
run;
data want; set have;
Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
run;
result:
The data step code could be reduced to:
data want;
set have; Valid = (input(SA_ID,??yymmdd6.) ne .); run;
A specified length on the informat only reads that many characters so the substr shouldn't be needed.
Hi @ballardw
Yes, of course. It is always good practice to avoid unnecessary functions, and the substr function is "overdoing things". I often do that, and I checked a couple of programs.
I found this construct: ThisSSN = strip(scan(RelationList,i,',')) - I know that RelationList is always a list of social security numbers, They will never contain embedded blanks or other default delimiters, so in this case scanning by belimiter and stripping is unnecessary. ThisSSN = scan(RelationList,i) would give the same result.
We all have a set of ingrown habits, when it comes to coding, and I wonder how much effort should be put in the final optimization of a program. it certainly makes the code more readable, but does it matter in terms of computing time? I made a test just for fun to see the impact of removing the substring function in the valid-check, and it does make a difference, about 10%:
45 data _null_;
46 set have;
47 *Valid = (input(SA_ID,??yymmdd6.) ne .);
48 Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
49 run;
NOTE: There were 50000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 6.53 seconds
cpu time 6.54 seconds
50 data _null_;
51 set have;
52 Valid = (input(SA_ID,??yymmdd6.) ne .);
53 *Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
54 run;
NOTE: There were 50000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 6.06 seconds
cpu time 5.97 seconds
I wish you a good weekend
Erik
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.