Task: select a minimum sample of cases that taken together, provide at least one example of all specified conditions, for QC
For example
I might want to find at least one case where var1 is nonmissing.
I might need to see at least one case where variable, HHincome > 50,000
Purpose/context:
I'm working on an application that displays on the computer, information about a single person, with values populated by background datasets. In order to verify that everything is working properly, I have to eye-ball at least one example that shows me that a date, for example, is displaying in the correct format, and at least one example that proves that a second email is showing up. If these are rare, I might have to look at two different cases. But I have at least 100 variables or conditions of interest that occur rarely, and for reasons I won't get into, it is impractical to view/document over a hundred examples.
I was thinking that I could set up a series of arrays that keep track of which cases can be observed to verify that each condition is addressed. I could then analyze the info in the arrays to come up with a sample of cases that taken together, make if possible to do a complete QC.
Does that make sense? Do you have other ideas?
Thanks
.
A different approach entirely ... take some real data and create the conditions you want to check. For example:
data checkthis;
set have;
if _n_=1 then do;
var1=.;
output;
var1=5;
hhincome = 75000;
output;
end;
else if _n_=2 then do;
* create some additional conditions to check;
output;
end;
run;
A different approach entirely ... take some real data and create the conditions you want to check. For example:
data checkthis;
set have;
if _n_=1 then do;
var1=.;
output;
var1=5;
hhincome = 75000;
output;
end;
else if _n_=2 then do;
* create some additional conditions to check;
output;
end;
run;
Thanks Astounding,
Your suggestion led to the following strategy for identifying cases that meet 3 infrequent conditions:
data qc;
set rcls;
if purpose_appt ne '' then do; qc1=1; output;end;
if purpose_gen ne '' then do;qc2=1; output;end;
if contact_mode =6 then do;qc3=1; output;end;
run;
proc print data=qc;
var sid2018 qc1 qc2 qc3 purpose_appt purpose_gen contact_mode ;
run;
SAS Output
0107730020 | 1 | Appointment | 1 | |||
0107730030 | 1 | Appointment | 1 | |||
0109620010 | 1 | General info | 1 | |||
0109620010 | 1 | Appointment | 6 | |||
0109620010 | 1 | 1 | Appointment | 6 |
This shows me that
1. 0109620010 would be a good choice for review because it satisfies two of the conditions
2.0109620010 is also needed because condition 2 is not met in case 0109620010(or any other cases for that matter)
3. no need to check obs1, obs2, or obs4.
This is a great strategy for finding examples that meet conditions of interest for relatively rare conditions. It is exactly the type of thing I need.
Thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.