Hello,
I would like to create an error check table with a logic statement below. Please use Proc SQL format and help me complete the Where statement. Thanks.
If a treatment (meds) = 1 (Yes), I'm look for any subgroups (steroids/vaso/immunemods/monoclonal/ antiviral/othertx) are either missing (.) or No (0).
proc sql;
create table want as
select meds, steroids, vaso, immunemods, monoclonal, antiviral, othertx
from have
where meds=1 and (???)
order by site;
@ybz12003 wrote:
Hello,
I would like to create an error check table with a logic statement below. Please use Proc SQL format and help me complete the Where statement. Thanks.
If a treatment (meds) = 1 (Yes), I'm look for any subgroups (steroids/vaso/immunemods/monoclonal/ antiviral/othertx) are either missing (.) or No (0).
proc sql; create table want as select meds, steroids, vaso, immunemods, monoclonal, antiviral, othertx from have where meds=1 and (???) order by site;
So your variables are Boolean (1=TRUE and 0=FALSE) with some missing values?
If so your test is the condition that it is NOT true that ALL of them are TRUE. So test if the SUM() of them is not equal to the number of variables. (Note you cannot just test if the MIN() is equal to 1 since MIN() will ignore the missing values.)
where meds and (6 ne sum(steroids, vaso, immunemods, monoclonal, antiviral, othertx))
But perhaps you just didn't describe what you want clearly? It would make more sense to me to look for the observations where MEDS is TRUE but all of the other variables is FALSE as that seems to indicate an inconsistency. So the MAX() will be TRUE if ANY of them is TRUE. So in that case the problem records are those with:
where meds and not max(steroids, vaso, immunemods, monoclonal, antiviral, othertx)
@ybz12003 wrote:
Hello,
I would like to create an error check table with a logic statement below. Please use Proc SQL format and help me complete the Where statement. Thanks.
If a treatment (meds) = 1 (Yes), I'm look for any subgroups (steroids/vaso/immunemods/monoclonal/ antiviral/othertx) are either missing (.) or No (0).
proc sql; create table want as select meds, steroids, vaso, immunemods, monoclonal, antiviral, othertx from have where meds=1 and (???) order by site;
So your variables are Boolean (1=TRUE and 0=FALSE) with some missing values?
If so your test is the condition that it is NOT true that ALL of them are TRUE. So test if the SUM() of them is not equal to the number of variables. (Note you cannot just test if the MIN() is equal to 1 since MIN() will ignore the missing values.)
where meds and (6 ne sum(steroids, vaso, immunemods, monoclonal, antiviral, othertx))
But perhaps you just didn't describe what you want clearly? It would make more sense to me to look for the observations where MEDS is TRUE but all of the other variables is FALSE as that seems to indicate an inconsistency. So the MAX() will be TRUE if ANY of them is TRUE. So in that case the problem records are those with:
where meds and not max(steroids, vaso, immunemods, monoclonal, antiviral, othertx)
I don't think I grasp the problem. When you say
I'm look for any subgroups (steroids/vaso/immunemods/monoclonal/ antiviral/othertx) are either missing (.) or No (0).
What do you mean by subgroup?
Your title says you want ALL subgroup treatments, but your text says you want ANY subgroups. Which is it?
Can you show us (or make up) a small amount of data, along with the desired output?
Just a style note. This is an extremely simple appearing query BUT you may find it easier to use a data step in the long run with some things like this, especially if you end up needing to use the same list of variables multiple time.
The data step would let you use either an array or possibly the two dash list if all of your steroids to othertx variables are adjacent in the data set. SQL won't allow either shortcut.
Consider that answering @Reeza's question about the N, Nmiss and Sum of the variables.
data want; set have; array v(*) steroids vaso immunemods monoclonal antiviral othertx; listn = n(of v(*) ); listnmiss = nmiss( of v(*) ); listsum = sum(of v(*) );
if meds=1 and ( min(of v(*))=0 or listnmiss>0); /* maybe coding your statment relatively directly*/
/* or alternately as if at least one of the variables is 0 or missing the sum is less than 6
if meds=1 and sum( of v(*)) < 6;
*/ run;
With SQL you have to explicitly type the names of all of the variables into the function parameters.
Likely if the set is "large" the data step will run quicker as well.
Thank you so much for your help. I got it.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.