Hi,
I have a dataset I want to flag if a condition is met.
The below is how my data look like.
ID treatment
1 0
1 1
1 1
2 1
2 1
2 1
3 1
3 2
3 2
And I want to flag Ids that have used more than 1 treatment
ID treatment Flag_multiple_tx (yes=1 no=0)
1 0 1
1 1 1
1 1 1
2 1 0
2 1 0
2 1 0
3 1 1
3 2 1
3 2 1
I tried to use
if tx=0 and tx=1 then flag_multiple_tx=0;
if tx=0 and tx=2 then flag_multiple_tx=0;
and so on.
But it didn't work
Thanks
As long as we're accepting late entries, I would suggest a different end result:
proc sql;
create table want as
select *, sum(treatment) as flag
from have
group by id;
quit;
Now your FLAG values won't always be 0 or 1. But they will be more useful, because they hold the number of treatment observations for that person. You can easily select observations with TREATMENT = 0, or with TREATMENT > 0.
Something like (and post test data in the form of a datastep in a code window for tested code):
proc sql; create table WANT as select A.*, B.FLAG from HAVE A left join (select ID,case when sum(TREATMENT) > 1 then 1 else 0 end as FLAG from HAVE group by ID) B on A.ID=B.ID; quit;
I don't think @RW9's code will do what you want. I, personally, prefer using what's known as a DOW loop in a data step to solve such problems:
data want (drop=check); do until (last.id); set have; by id; if first.id then do; check=treatment; flag=0; end; else if check ne treatment then flag=1; end; do until (last.id); set have; by id; output; end; run;
Art, CEO, AnalystFinder.com
Conversely, if you prefer proc sql, I think that the following will do what you want:
proc sql; create table WANT as select A.*, B.FLAG from HAVE A left join (select distinct ID, case when count(distinct TREATMENT) > 1 then 1 else 0 end as FLAG from HAVE group by ID) B on A.ID=B.ID; quit;
Art, CEO, AnalystFinder.com
Here is a way to replicate the double DOW approach demonstrated by @art297, without specifying do loops. Like the double DOW this approach reads each ID twice - in the first pass it generates data needed for the criterion you want, and in the second pass it uses that criterion.
I've begun to prefer this technique for many cases in which I used to implement the double dow technique.
data have;
input ID treatment ;
datalines;
1 0
1 1
1 1
2 1
2 1
2 1
3 1
3 2
3 2
run;
data want (drop=n_treat);
set have (in=firstpass)
have (in=secondpass);
by id;
if firstpass and lag(treatment)^=treatment then n_treat+1;
if first.id then n_treat=1;
if secondpass;
flag_multiple=(n_treat>1);
run;
Notes:
Hi, My apologies for late entry. You could also try the most easiest of all:
NOTES: Take advantage of remerging stats and boolean as shown below
data have;
input ID treatment ;
datalines;
1 0
1 1
1 1
2 1
2 1
2 1
3 1
3 2
3 2
run;
proc sql;
create table want as
select *, min(treatment) ne max(treatment) as flag
from have
group by id;
quit;
As long as we're accepting late entries, I would suggest a different end result:
proc sql;
create table want as
select *, sum(treatment) as flag
from have
group by id;
quit;
Now your FLAG values won't always be 0 or 1. But they will be more useful, because they hold the number of treatment observations for that person. You can easily select observations with TREATMENT = 0, or with TREATMENT > 0.
I don't think SUM(treatment) would be that useful. Both triplets (1,1,1) and (0,1,2) have a sum of 3, but the first has 1 distinct treatment code, and the other has 3 codes.
Also, if I know my data is sorted by the group variable, I find that the DATA step approach is typically faster than the proc sql. After all the latter will have "NOTE: The query requires remerging summary statistics back with the original data". Doing the example we have worked with, multiplied to 30 million (and more) records, the data step was consistently twice as fast as the sql approach. Not meaningful for a one-off application, but worth considering in production mode.
My bad. I was too hasty, thinking that treatment would always be 0 or 1. Don't use my code! But do consider whether you would be better served by creating a count rather than a flag.
Thanks all for your responses!
@leahcho: I could care less whose CORRECT response you indicate as the solution, but why in the world would you select one that the author, himself, told you NOT to use because it was flawed?
When others find your post in a search, they will adopt the code that you said correctly solved your problem. In this case, you'll be causing them a lot of grief.
Art, CEO, AnalystFinder.com
@leahcho Thank you. I think either way with only 340 posts,i'll stay novice forever hahahahha
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.