BookmarkSubscribeRSS Feed
NOMBUH
Calcite | Level 5

Good Day

 

Please assist with a code to validate the first 6 digit of an SA ID Number(YYMMDD).

 

Thank you

 

 

5 REPLIES 5
andreas_lds
Jade | Level 19

I don't know what a "SA ID Number" is, so please post an excerpt of the data you have and what you except as result.

ed_sas_member
Meteorite | Level 14

Hi @NOMBUH 

 

Do you refer to the South African ID number?

 

If yes, as the first 6 digits (YYMMDD) are based on the date of birth, which level of validation do you want?

- exact date: do you have a dataset containing the date of birth of someone and its SA ID number?

- "plausible" date: do you want to test if YMMDD could be a date? (eg. MM -> 01 to 12), ...?

 

Please provide more details.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @NOMBUH 

 

A simple way to check for a valid date is to see if the string can be converted to a valid sas date with an input function.

 

Note ?? in the following example. It is a format modifier, so the input function does not treat an invalid date as an error, but returns a missing value.

 

code:

 

data have;
	SA_ID = '3333331234'; output;
	SA_ID = '1313131234'; output;
	SA_ID = '1212121234'; output;
	SA_ID = '1901271234'; output;	
run;

data want; set have;
	Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
run;

result:

 

datevalidation.gif

ballardw
Super User

The data step code could be reduced to:

data want; 
set have; Valid = (input(SA_ID,??yymmdd6.) ne .); run;

A specified length on the informat only reads that many characters so the substr shouldn't be needed.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @ballardw 

 

Yes, of course. It is always good practice to avoid unnecessary functions, and the substr function is "overdoing things". I often do that, and I checked a couple of programs.

 

I found this construct: ThisSSN = strip(scan(RelationList,i,',')) I know that RelationList is always a list of social security numbers, They will never contain embedded blanks or other default delimiters, so in this case scanning by belimiter and stripping is unnecessary. ThisSSN = scan(RelationList,i) would give the same result.

 

We all have a set of ingrown habits, when it comes to coding, and I wonder how much effort should be put in the final optimization of a program. it certainly makes the code more readable, but does it matter in terms of computing time? I made a test just for fun to see the impact of removing the substring function in the valid-check, and it does make a difference, about 10%:

 

45    data _null_;
46        set have;
47        *Valid = (input(SA_ID,??yymmdd6.) ne .);
48        Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
49    run;

NOTE: There were 50000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           6.53 seconds
      cpu time            6.54 seconds


50    data _null_;
51        set have;
52        Valid = (input(SA_ID,??yymmdd6.) ne .);
53        *Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
54    run;

NOTE: There were 50000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           6.06 seconds
      cpu time            5.97 seconds

I wish you a good weekend

Erik

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2495 views
  • 1 like
  • 5 in conversation