BookmarkSubscribeRSS Feed
NOMBUH
Calcite | Level 5

Good Day

 

Please assist with a code to validate the first 6 digit of an SA ID Number(YYMMDD).

 

Thank you

 

 

5 REPLIES 5
andreas_lds
Jade | Level 19

I don't know what a "SA ID Number" is, so please post an excerpt of the data you have and what you except as result.

ed_sas_member
Meteorite | Level 14

Hi @NOMBUH 

 

Do you refer to the South African ID number?

 

If yes, as the first 6 digits (YYMMDD) are based on the date of birth, which level of validation do you want?

- exact date: do you have a dataset containing the date of birth of someone and its SA ID number?

- "plausible" date: do you want to test if YMMDD could be a date? (eg. MM -> 01 to 12), ...?

 

Please provide more details.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @NOMBUH 

 

A simple way to check for a valid date is to see if the string can be converted to a valid sas date with an input function.

 

Note ?? in the following example. It is a format modifier, so the input function does not treat an invalid date as an error, but returns a missing value.

 

code:

 

data have;
	SA_ID = '3333331234'; output;
	SA_ID = '1313131234'; output;
	SA_ID = '1212121234'; output;
	SA_ID = '1901271234'; output;	
run;

data want; set have;
	Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
run;

result:

 

datevalidation.gif

ballardw
Super User

The data step code could be reduced to:

data want; 
set have; Valid = (input(SA_ID,??yymmdd6.) ne .); run;

A specified length on the informat only reads that many characters so the substr shouldn't be needed.

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @ballardw 

 

Yes, of course. It is always good practice to avoid unnecessary functions, and the substr function is "overdoing things". I often do that, and I checked a couple of programs.

 

I found this construct: ThisSSN = strip(scan(RelationList,i,',')) I know that RelationList is always a list of social security numbers, They will never contain embedded blanks or other default delimiters, so in this case scanning by belimiter and stripping is unnecessary. ThisSSN = scan(RelationList,i) would give the same result.

 

We all have a set of ingrown habits, when it comes to coding, and I wonder how much effort should be put in the final optimization of a program. it certainly makes the code more readable, but does it matter in terms of computing time? I made a test just for fun to see the impact of removing the substring function in the valid-check, and it does make a difference, about 10%:

 

45    data _null_;
46        set have;
47        *Valid = (input(SA_ID,??yymmdd6.) ne .);
48        Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
49    run;

NOTE: There were 50000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           6.53 seconds
      cpu time            6.54 seconds


50    data _null_;
51        set have;
52        Valid = (input(SA_ID,??yymmdd6.) ne .);
53        *Valid = (input(substr(SA_ID,1,6),??yymmdd6.) ne .);
54    run;

NOTE: There were 50000000 observations read from the data set WORK.HAVE.
NOTE: DATA statement used (Total process time):
      real time           6.06 seconds
      cpu time            5.97 seconds

I wish you a good weekend

Erik

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1104 views
  • 1 like
  • 5 in conversation