BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
leahcho
Obsidian | Level 7

Hi,

I have a dataset I want to flag if a condition is met.

 

The below is how my data look like.

ID   treatment    

1        0                   

1        1                   

1        1                   

2        1                   

2        1                    

2        1                   

3        1                   

3        2                   

3        2                    

 

And I want to flag Ids that have used more than 1 treatment

ID   treatment    Flag_multiple_tx (yes=1 no=0)

1        0                    1

1        1                    1

1        1                    1

2        1                    0

2        1                    0

2        1                    0

3        1                    1

3        2                    1

3        2                    1

 

 

I tried to use

if tx=0 and tx=1 then flag_multiple_tx=0;

if tx=0 and tx=2 then flag_multiple_tx=0;

and so on.

But it didn't work

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

As long as we're accepting late entries, I would suggest a different end result:

 

proc sql;

create table want as

select *, sum(treatment) as flag

from have

group by id;

quit;

 

Now your FLAG values won't always be 0 or 1.  But they will be more useful, because they hold the number of treatment observations for that person.  You can easily select observations with TREATMENT = 0, or with TREATMENT > 0.

View solution in original post

12 REPLIES 12
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Something like (and post test data in the form of a datastep in a code window for tested code):

proc sql;
  create table WANT as
  select A.*,
         B.FLAG
  from   HAVE A
  left join (select ID,case when sum(TREATMENT) > 1 then 1 else 0 end as FLAG from HAVE group by ID) B
  on     A.ID=B.ID;
quit;
art297
Opal | Level 21

I don't think @RW9's code will do what you want. I, personally, prefer using what's known as a DOW loop in a data step to solve such problems:

data want (drop=check);
  do until (last.id);
    set have;
    by id;
    if first.id then do;
      check=treatment;
      flag=0;
    end;
    else if check ne treatment then flag=1;
  end;
  do until (last.id);
    set have;
    by id;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

 

art297
Opal | Level 21

Conversely, if you prefer proc sql, I think that the following will do what you want:

proc sql;
  create table WANT as
    select A.*,
           B.FLAG
      from   HAVE A
        left join (select distinct ID,
                case when count(distinct TREATMENT) > 1 then 1
                else 0 end as FLAG from HAVE group by ID) B
           on A.ID=B.ID;
quit;

Art, CEO, AnalystFinder.com

 

mkeintz
PROC Star

Here is a way to replicate the double DOW approach demonstrated by @art297, without specifying do loops.   Like the double DOW this approach reads each ID twice - in the first pass it generates data needed for the criterion you want, and in the second pass it uses that criterion.

 

I've begun to prefer this technique for many cases in which I used to implement the double dow technique.

 

data have;
input ID   treatment    ;
datalines;
1        0                   
1        1                   
1        1                   
2        1                   
2        1                    
2        1                   
3        1                   
3        2                   
3        2             
run;
data want  (drop=n_treat);
  set have (in=firstpass) 
      have (in=secondpass);
  by id;

  if firstpass and lag(treatment)^=treatment then n_treat+1;
  if first.id then n_treat=1;
  if secondpass;
  flag_multiple=(n_treat>1);
run;

 

Notes:

  1. Using the "set have (in=firstpass) have (in=secondpass); by id;" forces each id to be read two times before proceeding to the next id - i.e. it is interleaving data set HAVE with itself, on an ID by ID basis.   Using the temporary variables firstpass and secondpass allows your program to know which phase it is processing.

  2. The "if firstpass and lag(treatment)^=treatment then n_treat+1" increments N_TREAT each time there is a treatment change.  In this case "n_treat+1"  instead of "n_treat=n_treat+1" is a "sum statement" which tells sas to avoid resetting N_TREAT to missing with every incoming observation.  The subsequent "if first.id then n_treat=1" resets at the start of each ID.
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20

Hi, My apologies for late entry. You could also try the most easiest of all:

NOTES: Take advantage of remerging stats and boolean as shown below

 

data have;

input ID   treatment    ;

datalines;

1        0                  

1        1                  

1        1                  

2        1                  

2        1                   

2        1                  

3        1                  

3        2                  

3        2            

run;

 

proc sql;

create table want as

select *, min(treatment) ne max(treatment) as flag

from have

group by id;

quit;

Astounding
PROC Star

As long as we're accepting late entries, I would suggest a different end result:

 

proc sql;

create table want as

select *, sum(treatment) as flag

from have

group by id;

quit;

 

Now your FLAG values won't always be 0 or 1.  But they will be more useful, because they hold the number of treatment observations for that person.  You can easily select observations with TREATMENT = 0, or with TREATMENT > 0.

mkeintz
PROC Star

I don't think SUM(treatment) would be that useful.    Both triplets  (1,1,1)   and (0,1,2) have a sum of 3, but the first has 1 distinct treatment code, and the other has 3 codes.

 

Also, if I know my data is sorted by the group variable, I find that the DATA step approach is typically faster than the proc sql.  After all the latter will have "NOTE: The query requires remerging summary statistics back with the original data".  Doing the example we have worked with, multiplied to 30 million (and more) records, the data step was consistently twice as fast as the sql approach.  Not meaningful for a one-off application, but worth considering in production mode.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Astounding
PROC Star

My bad.  I was too hasty, thinking that treatment would always be 0 or 1.  Don't use my code!  But do consider whether you would be better served by creating a count rather than a flag.

leahcho
Obsidian | Level 7

Thanks all for your responses!

art297
Opal | Level 21

@leahcho: I could care less whose CORRECT response you indicate as the solution, but why in the world would you select one that the author, himself, told you NOT to use because it was flawed?

 

When others find your post in a search, they will adopt the code that you said correctly solved your problem. In this case, you'll be causing them a lot of grief.

 

Art, CEO, AnalystFinder.com

leahcho
Obsidian | Level 7
Thanks for pointing out my mistake. I also noticed that I picked the right answer. When I was scrolling thru all the responses posted, I chose the one that looked similar to the one that actually worked.

Here is the one I used and worked
Proc sql;
Create table want as
Select * min(treatment) ne max (treatment) as flag
From have
Group by Id;
Quit;

Thanks

novinosrin
Tourmaline | Level 20

@leahcho  Thank you. I think either way with only 340 posts,i'll stay novice forever hahahahha

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 6543 views
  • 0 likes
  • 6 in conversation