I am learning about leveraging a data dictionary to write SAS code for formats, labels, etc. Interesting stuff. There are two cases.
For one study (that I posted about on here not too long ago) the data are all character when we bring them in, although they won't end up that way since some are numbers or dates in spirit. So what we do is have info in the data dictionary telling what type it is for edit check purposes, as in "Option List" (1=Yes, 2=No), or "Free Text" (can be anything) or "Free Text Number" (can be any number but we can edit check it with the min and max allowable value that is part of the data dictionary), etc. There are more but you get the idea. And we also use the data dictionary to write labels and formats, etc.
So that's all good. The nature of bringing the stuff in in that study above is such that we bring it in somehow from a database named Maria (I don't know if that's our name for it or if that's a brand name) and it is character from the start.
But in a new study we're doing we are bringing in data from .CSV files. As you know, SAS guesses about data attributes when you bring data in from .CSV files. And you can use the data dictionary to help SAS guess correctly. Okay, fine. But the thing is, I don't see how some edit checks will be possible that way.
For instance if something is going to end up as a date variable and it is an error, such as 06/22/217 for example, (which obviously should be 6/22/2017) and we bring it into SAS, it will immediately be a missing value and we will not know that it was 6/22/217 and thus we won't be able to edit check it or tell the site to correct it. That is just one example but there are others.
Of course one way to get around that is to bring it all in as character and then cope with it, as described in the first study in this post. But that begs the question of how to tell SAS to bring everything from a .CSV file into SAS while forcing every variable to be character. I don't know how to do that. And more generally, considering how I've described things above, is that even the right thing to do?
It seems to me that the optimal thing would be to simply force the people entering the data to enter it in the correct type. So if something is a date, you have to enter it as a date, and if you make a typo so that it's not a date then the system won't take it. But I get the impression that that's no an option for our operation and maybe it's not optimal anyway.
I hope this makes sense. Any help is greatly appreciated.
... View more