DATA Step, Macro, Functions and more

Flagging a variable

Accepted Solution Solved
Reply
Contributor
Posts: 29
Accepted Solution

Flagging a variable

Hi,

I have a dataset I want to flag if a condition is met.

 

The below is how my data look like.

ID   treatment    

1        0                   

1        1                   

1        1                   

2        1                   

2        1                    

2        1                   

3        1                   

3        2                   

3        2                    

 

And I want to flag Ids that have used more than 1 treatment

ID   treatment    Flag_multiple_tx (yes=1 no=0)

1        0                    1

1        1                    1

1        1                    1

2        1                    0

2        1                    0

2        1                    0

3        1                    1

3        2                    1

3        2                    1

 

 

I tried to use

if tx=0 and tx=1 then flag_multiple_tx=0;

if tx=0 and tx=2 then flag_multiple_tx=0;

and so on.

But it didn't work

 

Thanks


Accepted Solutions
Solution
‎12-12-2017 12:32 PM
Super User
Posts: 6,785

Re: Flagging a variable

Posted in reply to novinosrin

As long as we're accepting late entries, I would suggest a different end result:

 

proc sql;

create table want as

select *, sum(treatment) as flag

from have

group by id;

quit;

 

Now your FLAG values won't always be 0 or 1.  But they will be more useful, because they hold the number of treatment observations for that person.  You can easily select observations with TREATMENT = 0, or with TREATMENT > 0.

View solution in original post


All Replies
Super User
Super User
Posts: 9,599

Re: Flagging a variable

Something like (and post test data in the form of a datastep in a code window for tested code):

proc sql;
  create table WANT as
  select A.*,
         B.FLAG
  from   HAVE A
  left join (select ID,case when sum(TREATMENT) > 1 then 1 else 0 end as FLAG from HAVE group by ID) B
  on     A.ID=B.ID;
quit;
PROC Star
Posts: 8,167

Re: Flagging a variable

I don't think @RW9's code will do what you want. I, personally, prefer using what's known as a DOW loop in a data step to solve such problems:

data want (drop=check);
  do until (last.id);
    set have;
    by id;
    if first.id then do;
      check=treatment;
      flag=0;
    end;
    else if check ne treatment then flag=1;
  end;
  do until (last.id);
    set have;
    by id;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

 

PROC Star
Posts: 8,167

Re: Flagging a variable

Conversely, if you prefer proc sql, I think that the following will do what you want:

proc sql;
  create table WANT as
    select A.*,
           B.FLAG
      from   HAVE A
        left join (select distinct ID,
                case when count(distinct TREATMENT) > 1 then 1
                else 0 end as FLAG from HAVE group by ID) B
           on A.ID=B.ID;
quit;

Art, CEO, AnalystFinder.com

 

Trusted Advisor
Posts: 1,345

Re: Flagging a variable

Here is a way to replicate the double DOW approach demonstrated by @art297, without specifying do loops.   Like the double DOW this approach reads each ID twice - in the first pass it generates data needed for the criterion you want, and in the second pass it uses that criterion.

 

I've begun to prefer this technique for many cases in which I used to implement the double dow technique.

 

data have;
input ID   treatment    ;
datalines;
1        0                   
1        1                   
1        1                   
2        1                   
2        1                    
2        1                   
3        1                   
3        2                   
3        2             
run;
data want  (drop=n_treat);
  set have (in=firstpass) 
      have (in=secondpass);
  by id;

  if firstpass and lag(treatment)^=treatment then n_treat+1;
  if first.id then n_treat=1;
  if secondpass;
  flag_multiple=(n_treat>1);
run;

 

Notes:

  1. Using the "set have (in=firstpass) have (in=secondpass); by id;" forces each id to be read two times before proceeding to the next id - i.e. it is interleaving data set HAVE with itself, on an ID by ID basis.   Using the temporary variables firstpass and secondpass allows your program to know which phase it is processing.

  2. The "if firstpass and lag(treatment)^=treatment then n_treat+1" increments N_TREAT each time there is a treatment change.  In this case "n_treat+1"  instead of "n_treat=n_treat+1" is a "sum statement" which tells sas to avoid resetting N_TREAT to missing with every incoming observation.  The subsequent "if first.id then n_treat=1" resets at the start of each ID.
PROC Star
Posts: 1,836

Re: Flagging a variable

Hi, My apologies for late entry. You could also try the most easiest of all:

NOTES: Take advantage of remerging stats and boolean as shown below

 

data have;

input ID   treatment    ;

datalines;

1        0                  

1        1                  

1        1                  

2        1                  

2        1                   

2        1                  

3        1                  

3        2                  

3        2            

run;

 

proc sql;

create table want as

select *, min(treatment) ne max(treatment) as flag

from have

group by id;

quit;

Solution
‎12-12-2017 12:32 PM
Super User
Posts: 6,785

Re: Flagging a variable

Posted in reply to novinosrin

As long as we're accepting late entries, I would suggest a different end result:

 

proc sql;

create table want as

select *, sum(treatment) as flag

from have

group by id;

quit;

 

Now your FLAG values won't always be 0 or 1.  But they will be more useful, because they hold the number of treatment observations for that person.  You can easily select observations with TREATMENT = 0, or with TREATMENT > 0.

Trusted Advisor
Posts: 1,345

Re: Flagging a variable

Posted in reply to Astounding

I don't think SUM(treatment) would be that useful.    Both triplets  (1,1,1)   and (0,1,2) have a sum of 3, but the first has 1 distinct treatment code, and the other has 3 codes.

 

Also, if I know my data is sorted by the group variable, I find that the DATA step approach is typically faster than the proc sql.  After all the latter will have "NOTE: The query requires remerging summary statistics back with the original data".  Doing the example we have worked with, multiplied to 30 million (and more) records, the data step was consistently twice as fast as the sql approach.  Not meaningful for a one-off application, but worth considering in production mode.

Super User
Posts: 6,785

Re: Flagging a variable

My bad.  I was too hasty, thinking that treatment would always be 0 or 1.  Don't use my code!  But do consider whether you would be better served by creating a count rather than a flag.

Contributor
Posts: 29

Re: Flagging a variable

Thanks all for your responses!

PROC Star
Posts: 8,167

Re: Flagging a variable

@leahcho: I could care less whose CORRECT response you indicate as the solution, but why in the world would you select one that the author, himself, told you NOT to use because it was flawed?

 

When others find your post in a search, they will adopt the code that you said correctly solved your problem. In this case, you'll be causing them a lot of grief.

 

Art, CEO, AnalystFinder.com

Contributor
Posts: 29

Re: Flagging a variable

Thanks for pointing out my mistake. I also noticed that I picked the right answer. When I was scrolling thru all the responses posted, I chose the one that looked similar to the one that actually worked.

Here is the one I used and worked
Proc sql;
Create table want as
Select * min(treatment) ne max (treatment) as flag
From have
Group by Id;
Quit;

Thanks

PROC Star
Posts: 1,836

Re: Flagging a variable

@leahcho  Thank you. I think either way with only 340 posts,i'll stay novice forever hahahahha

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 355 views
  • 0 likes
  • 6 in conversation