I want to filter data according to if particular subject has treatment like "onsite administration" or "dose administration" I want to filter those subjects who has only "on site administation" only. some subjects have both treatment and some subjects has only one treatment. I want to filter only those subjects which have only one treatment. if i filter data like extrt="on site administration" then those subjects will get filtered which has taken both treatment. I want only those subject which has taken only "on site treatment" I have attached screenshot for better understanding.
Please help.
You need to aggregate to the test. MIN() and MAX() work well for BOOLEAN values. If the MAX() is TRUE then at least one of the values is TRUE. If the MIN() is TRUE then all of the values are TRUE. If the MAX() is FALSE then none of the values are TRUE. If the MIN() is FALSE then at least one of the values is FALSE.
So to test if ALL of the values are on site administration the do something like:
select subjid
, min(lowcase(EXCAT)='on site administration') as ON_SITE
from have
group by subjid
;
So ON_SITE will be TRUE (have a value of 1) when every observation for that SUBJID was on site administration and FALSE (have a value of zero) otherwise.
Note: If you want to filter individual observations based on attributes derived from groups of observations then you will need to process the data twice. Once to calculate the value you need to select the group you then need to remerge that value back onto all off the observations in the group. Either by using two steps or perhaps taking advantage of PROC SQL's ability to automatically remerge aggregate statistics back onto all observations in the group.
create table ON_SITE as
select *
from have
group by subjid
having min(lowcase(EXCAT)='on site administration')
;
You need to aggregate to the test. MIN() and MAX() work well for BOOLEAN values. If the MAX() is TRUE then at least one of the values is TRUE. If the MIN() is TRUE then all of the values are TRUE. If the MAX() is FALSE then none of the values are TRUE. If the MIN() is FALSE then at least one of the values is FALSE.
So to test if ALL of the values are on site administration the do something like:
select subjid
, min(lowcase(EXCAT)='on site administration') as ON_SITE
from have
group by subjid
;
So ON_SITE will be TRUE (have a value of 1) when every observation for that SUBJID was on site administration and FALSE (have a value of zero) otherwise.
Note: If you want to filter individual observations based on attributes derived from groups of observations then you will need to process the data twice. Once to calculate the value you need to select the group you then need to remerge that value back onto all off the observations in the group. Either by using two steps or perhaps taking advantage of PROC SQL's ability to automatically remerge aggregate statistics back onto all observations in the group.
create table ON_SITE as
select *
from have
group by subjid
having min(lowcase(EXCAT)='on site administration')
;
Thank you so much . it helped a lot
As a general heuristic using a DATA step, what I would do here is:
1. Create a dataset with subjects that took the first treatment.
2. Create a dataset with subjects that took the second treatment.
3. Merge those datasets together to form a third dataset. When doing that, use IN= dataset options, so you know which subjects came from which dataset.
4. Output from the third dataset only those subjects that appeared in exactly one of the datasets.
These steps could be combined into one DATA step, so something like:
proc sort data=original_data out=one(keep=subject trt) nodupkey;
by subject trt;
run;
data three;
merge one(where=(trt="TRT1") in=in1)
one(where=(trt="TRT2") in=in2);
by subject;
if sum(in1,in2)=1;
keep subject;
run;
Thank you so much
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!