You've hit on the problem. In the data dictionary it's a data but if there is an error in the data such that SAS can't make it a date, it will make it a missing value instead. Thus, we will get a missing value and never know it's a date, unless of course we look in the SAS log to see that message. But the problem is that when you have a big study, using the SAS log as an edit check mechanism doesn't work well. It's easier to have all the data in SAS and edit check it there, thus you want to be sure you have all the truly raw data in SAS in the first place.
If I make them $ then they'll be length 8 by default. Okay, that's no good, but what if I make them all $500 (or some other large number that will be big enough for anything)? Then having all those unnecessarily long variables will take up a lot of space but at least we'd be sure we have all the data. And then we would clean those, changing the final length to 20, say, if we wanted it that for a variable, changing another variable to numeric, changing another to date if we like, etc.
In summary, it seems like that in order to edit check the raw data in SAS, you have to get the raw data into SAS. And if the raw data is character then we have to get all that raw character data into SAS. Does that sound sensible?
... View more