03-20-2017 10:48 AM
Working with a wide dataset of 500+ variables, need to make sure all values in each row fall within their respective domains. Right now I am using a brute force approach of typing an if then statement that prints the study id, variable name, and value if the value falls outside of the domain:
if TQ301 not in (1:5,88,99) then put STUDYID 'TQ301 ' TQ301;
if TQ302 not in (1:10) then put STUDYID 'TQ302 ' TQ302;
if TQ303 not in (1:1000) then put STUDYID 'TQ303 ' TQ303;
What I'd like is a program that will only require me to enter the domain for each variable, something like:
TQ301 DOMAIN = (1:5, 88, 99, .)
TQ302 DOMAIN = (1:10)
TQ303 DOMAIN = (1:1000)
For TQ301-TQ303 do;
if [value] not in [domain] then print STUDYID 'variable name' [value];
Output would look like this:
STUDYNO TQ301 6
STUDYNO TQ302 11
STUDYNO TQ303 1001
03-20-2017 11:56 AM - edited 03-20-2017 04:15 PM
Without seeing an example of your data I really wonder about 3 "study identification" variables on a single record. Do you mean that you have data as such that one record may have data from 3 (or possibly more) studies?
I will submit that negative definitions are also a bit weak as 1001 is only yielding a result of TQ303 because it is overwriting the values your code assigned in the first two conditions.
Can you provide some example data of those variables? I suspect there may be a way to do this with a format but my initial approach would require only one of your variables TQ301, TQ302 and TQ303 to be defined (not missing) on each record.
Also the way your PUT statements are structured it looks you have a variable named studyid or is that a typo for generating your example desired output of "STUDYNO"???
Need further help from the community? Please ask a new question.