- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have a dataset I want to flag if a condition is met.
The below is how my data look like.
ID treatment
1 0
1 1
1 1
2 1
2 1
2 1
3 1
3 2
3 2
And I want to flag Ids that have used more than 1 treatment
ID treatment Flag_multiple_tx (yes=1 no=0)
1 0 1
1 1 1
1 1 1
2 1 0
2 1 0
2 1 0
3 1 1
3 2 1
3 2 1
I tried to use
if tx=0 and tx=1 then flag_multiple_tx=0;
if tx=0 and tx=2 then flag_multiple_tx=0;
and so on.
But it didn't work
Thanks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As long as we're accepting late entries, I would suggest a different end result:
proc sql;
create table want as
select *, sum(treatment) as flag
from have
group by id;
quit;
Now your FLAG values won't always be 0 or 1. But they will be more useful, because they hold the number of treatment observations for that person. You can easily select observations with TREATMENT = 0, or with TREATMENT > 0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Something like (and post test data in the form of a datastep in a code window for tested code):
proc sql; create table WANT as select A.*, B.FLAG from HAVE A left join (select ID,case when sum(TREATMENT) > 1 then 1 else 0 end as FLAG from HAVE group by ID) B on A.ID=B.ID; quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't think @RW9's code will do what you want. I, personally, prefer using what's known as a DOW loop in a data step to solve such problems:
data want (drop=check); do until (last.id); set have; by id; if first.id then do; check=treatment; flag=0; end; else if check ne treatment then flag=1; end; do until (last.id); set have; by id; output; end; run;
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Conversely, if you prefer proc sql, I think that the following will do what you want:
proc sql; create table WANT as select A.*, B.FLAG from HAVE A left join (select distinct ID, case when count(distinct TREATMENT) > 1 then 1 else 0 end as FLAG from HAVE group by ID) B on A.ID=B.ID; quit;
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is a way to replicate the double DOW approach demonstrated by @art297, without specifying do loops. Like the double DOW this approach reads each ID twice - in the first pass it generates data needed for the criterion you want, and in the second pass it uses that criterion.
I've begun to prefer this technique for many cases in which I used to implement the double dow technique.
data have;
input ID treatment ;
datalines;
1 0
1 1
1 1
2 1
2 1
2 1
3 1
3 2
3 2
run;
data want (drop=n_treat);
set have (in=firstpass)
have (in=secondpass);
by id;
if firstpass and lag(treatment)^=treatment then n_treat+1;
if first.id then n_treat=1;
if secondpass;
flag_multiple=(n_treat>1);
run;
Notes:
- Using the "set have (in=firstpass) have (in=secondpass); by id;" forces each id to be read two times before proceeding to the next id - i.e. it is interleaving data set HAVE with itself, on an ID by ID basis. Using the temporary variables firstpass and secondpass allows your program to know which phase it is processing.
- The "if firstpass and lag(treatment)^=treatment then n_treat+1" increments N_TREAT each time there is a treatment change. In this case "n_treat+1" instead of "n_treat=n_treat+1" is a "sum statement" which tells sas to avoid resetting N_TREAT to missing with every incoming observation. The subsequent "if first.id then n_treat=1" resets at the start of each ID.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, My apologies for late entry. You could also try the most easiest of all:
NOTES: Take advantage of remerging stats and boolean as shown below
data have;
input ID treatment ;
datalines;
1 0
1 1
1 1
2 1
2 1
2 1
3 1
3 2
3 2
run;
proc sql;
create table want as
select *, min(treatment) ne max(treatment) as flag
from have
group by id;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As long as we're accepting late entries, I would suggest a different end result:
proc sql;
create table want as
select *, sum(treatment) as flag
from have
group by id;
quit;
Now your FLAG values won't always be 0 or 1. But they will be more useful, because they hold the number of treatment observations for that person. You can easily select observations with TREATMENT = 0, or with TREATMENT > 0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't think SUM(treatment) would be that useful. Both triplets (1,1,1) and (0,1,2) have a sum of 3, but the first has 1 distinct treatment code, and the other has 3 codes.
Also, if I know my data is sorted by the group variable, I find that the DATA step approach is typically faster than the proc sql. After all the latter will have "NOTE: The query requires remerging summary statistics back with the original data". Doing the example we have worked with, multiplied to 30 million (and more) records, the data step was consistently twice as fast as the sql approach. Not meaningful for a one-off application, but worth considering in production mode.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My bad. I was too hasty, thinking that treatment would always be 0 or 1. Don't use my code! But do consider whether you would be better served by creating a count rather than a flag.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks all for your responses!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@leahcho: I could care less whose CORRECT response you indicate as the solution, but why in the world would you select one that the author, himself, told you NOT to use because it was flawed?
When others find your post in a search, they will adopt the code that you said correctly solved your problem. In this case, you'll be causing them a lot of grief.
Art, CEO, AnalystFinder.com
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is the one I used and worked
Proc sql;
Create table want as
Select * min(treatment) ne max (treatment) as flag
From have
Group by Id;
Quit;
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@leahcho Thank you. I think either way with only 340 posts,i'll stay novice forever hahahahha