SAS Programming

leahcho · Posted 12-11-2017 11:19 AM

Hi,

I have a dataset I want to flag if a condition is met.

The below is how my data look like.

ID treatment

1 0

1 1

2 1

3 1

3 2

And I want to flag Ids that have used more than 1 treatment

ID treatment Flag_multiple_tx (yes=1 no=0)

1 0 1

1 1 1

2 1 0

3 1 1

3 2 1

I tried to use

if tx=0 and tx=1 then flag_multiple_tx=0;

if tx=0 and tx=2 then flag_multiple_tx=0;

and so on.

But it didn't work

Thanks

Astounding · Posted 12-11-2017 02:02 PM

As long as we're accepting late entries, I would suggest a different end result:

proc sql;

create table want as

select *, sum(treatment) as flag

from have

group by id;

quit;

Now your FLAG values won't always be 0 or 1. But they will be more useful, because they hold the number of treatment observations for that person. You can easily select observations with TREATMENT = 0, or with TREATMENT > 0.

View solution in original post

RW9 · Posted 12-11-2017 11:31 AM

Something like (and post test data in the form of a datastep in a code window for tested code):

proc sql;
  create table WANT as
  select A.*,
         B.FLAG
  from   HAVE A
  left join (select ID,case when sum(TREATMENT) > 1 then 1 else 0 end as FLAG from HAVE group by ID) B
  on     A.ID=B.ID;
quit;

art297 · Posted 12-11-2017 11:39 AM

I don't think @RW9's code will do what you want. I, personally, prefer using what's known as a DOW loop in a data step to solve such problems:

data want (drop=check);
  do until (last.id);
    set have;
    by id;
    if first.id then do;
      check=treatment;
      flag=0;
    end;
    else if check ne treatment then flag=1;
  end;
  do until (last.id);
    set have;
    by id;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

art297 · Posted 12-11-2017 11:51 AM

Conversely, if you prefer proc sql, I think that the following will do what you want:

proc sql;
  create table WANT as
    select A.*,
           B.FLAG
      from   HAVE A
        left join (select distinct ID,
                case when count(distinct TREATMENT) > 1 then 1
                else 0 end as FLAG from HAVE group by ID) B
           on A.ID=B.ID;
quit;

Art, CEO, AnalystFinder.com

mkeintz · Posted 12-11-2017 12:34 PM

Here is a way to replicate the double DOW approach demonstrated by @art297, without specifying do loops. Like the double DOW this approach reads each ID twice - in the first pass it generates data needed for the criterion you want, and in the second pass it uses that criterion.

I've begun to prefer this technique for many cases in which I used to implement the double dow technique.

data have;
input ID   treatment    ;
datalines;
1        0                   
1        1                   
1        1                   
2        1                   
2        1                    
2        1                   
3        1                   
3        2                   
3        2             
run;
data want  (drop=n_treat);
  set have (in=firstpass) 
      have (in=secondpass);
  by id;

  if firstpass and lag(treatment)^=treatment then n_treat+1;
  if first.id then n_treat=1;
  if secondpass;
  flag_multiple=(n_treat>1);
run;

Notes:

Using the "set have (in=firstpass) have (in=secondpass); by id;" forces each id to be read two times before proceeding to the next id - i.e. it is interleaving data set HAVE with itself, on an ID by ID basis. Using the temporary variables firstpass and secondpass allows your program to know which phase it is processing.
The "if firstpass and lag(treatment)^=treatment then n_treat+1" increments N_TREAT each time there is a treatment change. In this case "n_treat+1" instead of "n_treat=n_treat+1" is a "sum statement" which tells sas to avoid resetting N_TREAT to missing with every incoming observation. The subsequent "if first.id then n_treat=1" resets at the start of each ID.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

novinosrin · Posted 12-11-2017 01:33 PM

Hi, My apologies for late entry. You could also try the most easiest of all:

NOTES: Take advantage of remerging stats and boolean as shown below

data have;

input ID treatment ;

datalines;

1 0

1 1

2 1

3 1

3 2

run;

proc sql;

create table want as

select *, min(treatment) ne max(treatment) as flag

from have

group by id;

quit;

Astounding · Posted 12-11-2017 02:02 PM

As long as we're accepting late entries, I would suggest a different end result:

proc sql;

create table want as

select *, sum(treatment) as flag

from have

group by id;

quit;

Now your FLAG values won't always be 0 or 1. But they will be more useful, because they hold the number of treatment observations for that person. You can easily select observations with TREATMENT = 0, or with TREATMENT > 0.

mkeintz · Posted 12-11-2017 07:25 PM

I don't think SUM(treatment) would be that useful. Both triplets (1,1,1) and (0,1,2) have a sum of 3, but the first has 1 distinct treatment code, and the other has 3 codes.

Also, if I know my data is sorted by the group variable, I find that the DATA step approach is typically faster than the proc sql. After all the latter will have "NOTE: The query requires remerging summary statistics back with the original data". Doing the example we have worked with, multiplied to 30 million (and more) records, the data step was consistently twice as fast as the sql approach. Not meaningful for a one-off application, but worth considering in production mode.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Astounding · Posted 12-12-2017 09:50 AM

My bad. I was too hasty, thinking that treatment would always be 0 or 1. Don't use my code! But do consider whether you would be better served by creating a count rather than a flag.

leahcho · Posted 12-12-2017 12:33 PM

Thanks all for your responses!

art297 · Posted 12-12-2017 02:31 PM

@leahcho: I could care less whose CORRECT response you indicate as the solution, but why in the world would you select one that the author, himself, told you NOT to use because it was flawed?

When others find your post in a search, they will adopt the code that you said correctly solved your problem. In this case, you'll be causing them a lot of grief.

Art, CEO, AnalystFinder.com

leahcho · Posted 12-12-2017 03:28 PM

Thanks for pointing out my mistake. I also noticed that I picked the right answer. When I was scrolling thru all the responses posted, I chose the one that looked similar to the one that actually worked.

Here is the one I used and worked
Proc sql;
Create table want as
Select * min(treatment) ne max (treatment) as flag
From have
Group by Id;
Quit;

Thanks

novinosrin · Posted 12-12-2017 03:37 PM

@leahcho Thank you. I think either way with only 340 posts,i'll stay novice forever hahahahha

SAS Programming

Flagging a variable

Re: Flagging a variable

Re: Flagging a variable

Re: Flagging a variable

Re: Flagging a variable

Re: Flagging a variable

Re: Flagging a variable

Re: Flagging a variable

Re: Flagging a variable

Re: Flagging a variable

Re: Flagging a variable

Re: Flagging a variable

Re: Flagging a variable

Re: Flagging a variable

Flag presence/absence across variables

Flagging all observations where certain variables differ/are same by i...

Deriving analysis flag variable

Flag Variables

Identify and flag the latest day

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...