Good Day
Please assist with a code to validate the first 6 digit of an SA ID Number(YYMMDD).
Thank you
I don't know what a "SA ID Number" is, so please post an excerpt of the data you have and what you except as result.
Hi @NOMBUH
Do you refer to the South African ID number?
If yes, as the first 6 digits (YYMMDD) are based on the date of birth, which level of validation do you want?
- exact date: do you have a dataset containing the date of birth of someone and its SA ID number?
- "plausible" date: do you want to test if YMMDD could be a date? (eg. MM -> 01 to 12), ...?
Please provide more details.
Hi @NOMBUH
A simple way to check for a valid date is to see if the string can be converted to a valid sas date with an input function.
Note ?? in the following example. It is a format modifier, so the input function does not treat an invalid date as an error, but returns a missing value.
code:
data have;
SA_ID = '3333331234'; output;
SA_ID = '1313131234'; output;
SA_ID = '1212121234'; output;
SA_ID = '1901271234'; output;
run;
data want; set have;
Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
run;
result:
The data step code could be reduced to:
data want;
set have; Valid = (input(SA_ID,??yymmdd6.) ne .); run;
A specified length on the informat only reads that many characters so the substr shouldn't be needed.
Hi @ballardw
Yes, of course. It is always good practice to avoid unnecessary functions, and the substr function is "overdoing things". I often do that, and I checked a couple of programs.
I found this construct: ThisSSN = strip(scan(RelationList,i,',')) - I know that RelationList is always a list of social security numbers, They will never contain embedded blanks or other default delimiters, so in this case scanning by belimiter and stripping is unnecessary. ThisSSN = scan(RelationList,i) would give the same result.
We all have a set of ingrown habits, when it comes to coding, and I wonder how much effort should be put in the final optimization of a program. it certainly makes the code more readable, but does it matter in terms of computing time? I made a test just for fun to see the impact of removing the substring function in the valid-check, and it does make a difference, about 10%:
45 data _null_;
46 set have;
47 *Valid = (input(SA_ID,??yymmdd6.) ne .);
48 Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
49 run;
NOTE: There were 50000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 6.53 seconds
cpu time 6.54 seconds
50 data _null_;
51 set have;
52 Valid = (input(SA_ID,??yymmdd6.) ne .);
53 *Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
54 run;
NOTE: There were 50000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
real time 6.06 seconds
cpu time 5.97 seconds
I wish you a good weekend
Erik
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.