I have a character column called BIRTH_DT, where all the values are stored as DDMMYY.
Example values:
"240578"
"210388"
"321284"
"010295"
"300594"
I want to extract all the values that aren't stored in a correct date format/syntax. So for example, the value "321284" would be extracted.
I could just do a substring of the first two sets of numbers and check that the day is between 01-31, and the month between 01-12, but that would a cheap solution with plenty of room for error.
Although this kind of date quality validation must be an incredibly frequent thing to do, it's surprisingly hard to find any good advice about it on Google. Perhaps I'm not a particularly good Googler. Any advice would be appreciated, thanks.
Generally in SAS we recommend using SAS date variables instead of character.
if you do something like this in a datastep:
testdate= input(birth_dt, ddmmyy.);
if testdate=. then put "BIRTH_DT of " birth_dt "not valid for " <record identfying variables>;
Any values that are not valid for your format will result in a missing value of Testdate and the IF statement will write to the log any associated information if you add other variables.
Generally in SAS we recommend using SAS date variables instead of character.
if you do something like this in a datastep:
testdate= input(birth_dt, ddmmyy.);
if testdate=. then put "BIRTH_DT of " birth_dt "not valid for " <record identfying variables>;
Any values that are not valid for your format will result in a missing value of Testdate and the IF statement will write to the log any associated information if you add other variables.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.