Task: select a minimum sample of cases that taken together, provide at least one example of all specified conditions, for QC
For example
I might want to find at least one case where var1 is nonmissing.
I might need to see at least one case where variable, HHincome > 50,000
Purpose/context:
I'm working on an application that displays on the computer, information about a single person, with values populated by background datasets. In order to verify that everything is working properly, I have to eye-ball at least one example that shows me that a date, for example, is displaying in the correct format, and at least one example that proves that a second email is showing up. If these are rare, I might have to look at two different cases. But I have at least 100 variables or conditions of interest that occur rarely, and for reasons I won't get into, it is impractical to view/document over a hundred examples.
I was thinking that I could set up a series of arrays that keep track of which cases can be observed to verify that each condition is addressed. I could then analyze the info in the arrays to come up with a sample of cases that taken together, make if possible to do a complete QC.
Does that make sense? Do you have other ideas?
Thanks
.
A different approach entirely ... take some real data and create the conditions you want to check. For example:
data checkthis;
set have;
if _n_=1 then do;
var1=.;
output;
var1=5;
hhincome = 75000;
output;
end;
else if _n_=2 then do;
* create some additional conditions to check;
output;
end;
run;
A different approach entirely ... take some real data and create the conditions you want to check. For example:
data checkthis;
set have;
if _n_=1 then do;
var1=.;
output;
var1=5;
hhincome = 75000;
output;
end;
else if _n_=2 then do;
* create some additional conditions to check;
output;
end;
run;
Thanks Astounding,
Your suggestion led to the following strategy for identifying cases that meet 3 infrequent conditions:
data qc;
set rcls;
if purpose_appt ne '' then do; qc1=1; output;end;
if purpose_gen ne '' then do;qc2=1; output;end;
if contact_mode =6 then do;qc3=1; output;end;
run;
proc print data=qc;
var sid2018 qc1 qc2 qc3 purpose_appt purpose_gen contact_mode ;
run;
SAS Output
0107730020 | 1 | Appointment | 1 | |||
0107730030 | 1 | Appointment | 1 | |||
0109620010 | 1 | General info | 1 | |||
0109620010 | 1 | Appointment | 6 | |||
0109620010 | 1 | 1 | Appointment | 6 |
This shows me that
1. 0109620010 would be a good choice for review because it satisfies two of the conditions
2.0109620010 is also needed because condition 2 is not met in case 0109620010(or any other cases for that matter)
3. no need to check obs1, obs2, or obs4.
This is a great strategy for finding examples that meet conditions of interest for relatively rare conditions. It is exactly the type of thing I need.
Thanks!
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.