SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Question about date quality validation

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 90
Accepted Solution

Question about date quality validation

I have a character column called BIRTH_DT, where all the values are stored as DDMMYY.

Example values:

"240578"

"210388"

"321284"

"010295"

"300594"

I want to extract all the values that aren't stored in a correct date format/syntax. So for example, the value "321284" would be extracted.

I could just do a substring of the first two sets of numbers and check that the day is between 01-31, and the month between 01-12, but that would a cheap solution with plenty of room for error.

Although this kind of date quality validation must be an incredibly frequent thing to do, it's surprisingly hard to find any good advice about it on Google. Perhaps I'm not a particularly good Googler. Any advice would be appreciated, thanks. Smiley Happy


Accepted Solutions
Solution
‎10-28-2013 09:58 AM
Super User
Posts: 11,336

Re: Question about date quality validation

Posted in reply to EinarRoed

Generally in SAS we recommend using SAS date variables instead of character.

if you do something like this in a datastep:

testdate= input(birth_dt, ddmmyy.);

if testdate=. then put "BIRTH_DT of " birth_dt "not valid for " <record identfying variables>;

Any values that are not valid for your format will result in a missing value of Testdate and the IF statement will write to the log any associated information if you add other variables.


View solution in original post


All Replies
Solution
‎10-28-2013 09:58 AM
Super User
Posts: 11,336

Re: Question about date quality validation

Posted in reply to EinarRoed

Generally in SAS we recommend using SAS date variables instead of character.

if you do something like this in a datastep:

testdate= input(birth_dt, ddmmyy.);

if testdate=. then put "BIRTH_DT of " birth_dt "not valid for " <record identfying variables>;

Any values that are not valid for your format will result in a missing value of Testdate and the IF statement will write to the log any associated information if you add other variables.


🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 203 views
  • 0 likes
  • 2 in conversation