06-22-2015 01:54 AM
Hello, I have a problem with the Date column in the incoming source file a tab delimited file to be precise. The date value should or expected to be in yyyymmdd format as part of the business requirement, but some values are erroneously existing in in other formats in the source file such as ddmmyy etc. How do i check this? And when identified, how do i convert those values into yyyymmdd format properly to satisfy business.
06-22-2015 02:30 AM
Since you can't know if 01022015 is February 1st or January 2nd, return file to sender with a some very harsh words about correct data. It's your job to import data, and not to make uneducated guesses. If some people like to be idiots, don't join up with them.
The only thing I would do is to write a data step that fails when data does not comply to the agreed-on formats, so that the job scheduling can prevent further processing of bogus data.
06-22-2015 02:51 AM
I agree, but how to flag those erroneous records and reject. I mean when i read them using an informat yyyymmdd or with the one anydtdte suggested by Linus. Luckily, On a quick glance, manually looking at the records I found these erroneous values. But to send them I seek to provide them some level of justification to prove there are such records existing.
06-22-2015 03:05 AM
DON'T use something forgiving like anydte.
Use the specific format that was agreed upon, then the input statement will set the automatic variable _ERROR_ to 1, which you can check for at the end of the data step and perform a suitable abort with an exit code ne 0.
if _ERROR_ ne 0 then abort abend 5;
06-22-2015 03:26 AM
Yes, That makes sense and I'm glad you suggested ways to make my life simple and easy rather than going through with the headache of cleaning records. lol
06-22-2015 11:22 AM
If the data is supposed to always be present but sometimes isn't you may want to check for missing as that doesn't generate an error.
Also this stub will collect all of the problem case records into a data set for generating an error report.
data xxx baddates;
if _ERROR_ ne 0 then output baddates; /* or: if missing(date) instead of If _error_ ne 0 */
else output xxx;
06-22-2015 02:31 AM
You could try the anydtdte. informat.
If you wish higher control/prio between formats, import as char, and input it using the different formats you see possible, and use the coalesce function to chose the successful result (in you prio order).
But, decent source data is the best...