BookmarkSubscribeRSS Feed
NOMBUH
Calcite | Level 5

Good Day

 

Please assist with a code to validate the first 6 digit of an SA ID Number(YYMMDD).

 

Thank you

 

 

5 REPLIES 5
andreas_lds
Jade | Level 19

I don't know what a "SA ID Number" is, so please post an excerpt of the data you have and what you except as result.

ed_sas_member
Meteorite | Level 14

Hi @NOMBUH 

 

Do you refer to the South African ID number?

 

If yes, as the first 6 digits (YYMMDD) are based on the date of birth, which level of validation do you want?

- exact date: do you have a dataset containing the date of birth of someone and its SA ID number?

- "plausible" date: do you want to test if YMMDD could be a date? (eg. MM -> 01 to 12), ...?

 

Please provide more details.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @NOMBUH 

 

A simple way to check for a valid date is to see if the string can be converted to a valid sas date with an input function.

 

Note ?? in the following example. It is a format modifier, so the input function does not treat an invalid date as an error, but returns a missing value.

 

code:

 

data have;
	SA_ID = '3333331234'; output;
	SA_ID = '1313131234'; output;
	SA_ID = '1212121234'; output;
	SA_ID = '1901271234'; output;	
run;

data want; set have;
	Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
run;

result:

 

datevalidation.gif

ballardw
Super User

The data step code could be reduced to:

data want; 
set have; Valid = (input(SA_ID,??yymmdd6.) ne .); run;

A specified length on the informat only reads that many characters so the substr shouldn't be needed.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @ballardw 

 

Yes, of course. It is always good practice to avoid unnecessary functions, and the substr function is "overdoing things". I often do that, and I checked a couple of programs.

 

I found this construct: ThisSSN = strip(scan(RelationList,i,',')) I know that RelationList is always a list of social security numbers, They will never contain embedded blanks or other default delimiters, so in this case scanning by belimiter and stripping is unnecessary. ThisSSN = scan(RelationList,i) would give the same result.

 

We all have a set of ingrown habits, when it comes to coding, and I wonder how much effort should be put in the final optimization of a program. it certainly makes the code more readable, but does it matter in terms of computing time? I made a test just for fun to see the impact of removing the substring function in the valid-check, and it does make a difference, about 10%:

 

45    data _null_;
46        set have;
47        *Valid = (input(SA_ID,??yymmdd6.) ne .);
48        Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
49    run;

NOTE: There were 50000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           6.53 seconds
      cpu time            6.54 seconds


50    data _null_;
51        set have;
52        Valid = (input(SA_ID,??yymmdd6.) ne .);
53        *Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
54    run;

NOTE: There were 50000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           6.06 seconds
      cpu time            5.97 seconds

I wish you a good weekend

Erik

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2507 views
  • 1 like
  • 5 in conversation