Complicated conditional recodes

Accepted Solution Solved
Reply
Contributor kbk
Contributor
Posts: 29
Accepted Solution

Complicated conditional recodes

I'm stumped and could use some help coding this logic. I essentially need to do a conditional find and replace across the entire dataset. I've tried using hash tables but the code and output inevitably confuse so I am leaving it out for now. I will add the hash code if requested. The setup:

/* Rules:

IF Study = A and Flag = 1 then Can_Link = 1 for all instances of associated MID

IF Study = B and Flag = 1 then Can_Link = 1 where Study = B or E for all instances of associated MID

Else Can_Link = Flag

*/

data have;

input MID $ RID $ Flag Study $ Can_Link ;

datalines;                     

1 6 1 A .

1 5 0 B .

2 3 0 A .

2 4 1 A .

2 1 0 B .

3 8 0 C .

4 7 1 A .

5 9 0 D .

6 10 0 E .

6 11 1 B .

6 12 0 C .

;

run;

*output should match this;

data want;

input MID $ RID $ Flag Study $ can_link ;

datalines;                     

1 6 1 A 1

1 5 0 B 1

2 3 0 A 1

2 4 1 A 1

2 1 0 B 1

3 8 0 C 0

4 7 1 A 1

5 9 0 D 0

6 10 0 E 1

6 11 1 B 1

6 12 0 C 0

;

run;



Edit to fix error pointed out by Tom


Accepted Solutions
Solution
‎09-09-2013 11:24 AM
Super User
Super User
Posts: 7,074

Re: Complicated conditional recodes

Your rules do not account for why CAN_LINK=1 for MID=3.

data want ;

  do until(last.mid);

    set have;

    by mid;

    if study='A' and flag=1 then force1=1;

    if study='B' and flag=1 then force2=1;

  end;

  do until(last.mid);

    set have;

    by mid;

    if force1 then can_link=1;

    else if force2 and study in ('B','E') then can_link=1;

    else can_link=flag ;

    put (mid -- can_link) (Smiley Happy;

    output;

  end;

run;

1 6 1 A 1

1 5 0 B 1

2 3 0 A 1

2 4 1 A 1

2 1 0 B 1

3 8 0 C 0

4 7 1 A 1

5 9 0 D 0

6 10 0 E 1

6 11 1 B 1

6 12 0 C 0

View solution in original post


All Replies
Solution
‎09-09-2013 11:24 AM
Super User
Super User
Posts: 7,074

Re: Complicated conditional recodes

Your rules do not account for why CAN_LINK=1 for MID=3.

data want ;

  do until(last.mid);

    set have;

    by mid;

    if study='A' and flag=1 then force1=1;

    if study='B' and flag=1 then force2=1;

  end;

  do until(last.mid);

    set have;

    by mid;

    if force1 then can_link=1;

    else if force2 and study in ('B','E') then can_link=1;

    else can_link=flag ;

    put (mid -- can_link) (Smiley Happy;

    output;

  end;

run;

1 6 1 A 1

1 5 0 B 1

2 3 0 A 1

2 4 1 A 1

2 1 0 B 1

3 8 0 C 0

4 7 1 A 1

5 9 0 D 0

6 10 0 E 1

6 11 1 B 1

6 12 0 C 0

Contributor kbk
Contributor
Posts: 29

Re: Complicated conditional recodes

@Tom I have a question regarding how SAS processes the by groups in your code.

In the first loop, are rows that meet the conditions the only ones flagged in the force columns?  Or, does SAS check if any MIDs of the group meet the condition (e.g., study='A' and flag=1) and then apply the rule (e.g., force1 = 1) to all rows within that group?

I looked for documentation on how SAS processes conditional with respect to groups but didn't see an answer.

Super User
Super User
Posts: 7,074

Re: Complicated conditional recodes

Because the SET statements are nested inside the DO loops the data step will only restart once per MID value. At the top of the data step the values will be set to missing and during the first DO loop the values might get set to 1 (true).  The values for FORCE1 and FORCE2 set during the first DO loop will be constant during the second DO loop.

Note that this would not work if those variables existed in the input dataset because the values would change each time a record was read from the dataset by one of the SET statements.

Respected Advisor
Posts: 4,930

Re: Complicated conditional recodes

According to your rules, MID=3  with Study="C" should have Can_Link = Flag = 0.

Your rules can be coded as :

proc sql;

create table want as

select

     X.MID, X.RID, X.Flag, X.Study,

     case

          when exists (select * from have where MID=X.MID and Study="A" and flag=1) then 1

          when exists (select * from have where MID=X.MID and Study="B" and flag=1) and Study in ("B","E") then 1

          else Flag

          end as Can_Link

from have as X;

quit;

PG

PG
Contributor kbk
Contributor
Posts: 29

Re: Complicated conditional recodes

Thank you Tom and PGStats! Both work but I went with Tom's solution because the actual dataset has ~130,000 observations and the subqueries are performance killers.

Respected Advisor
Posts: 3,156

Re: Complicated conditional recodes

The more, the merrier:

data have;

input MID $ RID $ Flag Study $ ;

datalines;                     

1 6 1 A

1 5 0 B

2 3 0 A

2 4 1 A

2 1 0 B

3 8 0 C

4 7 1 A

5 9 0 D

6 10 0 E

6 11 1 B

6 12 0 C

;

run;

proc sql;

create table want_sql as

select MID, RID , Flag, Study, case when sum(study='B' and flag=1)>0 then 1 else _link end as can_link from

   (select *, case when sum(study='A' and Flag=1)>0 then 1

                  else flag

             end as _Link,

                   study in ('B','E') as _grp

  from have

  group by mid)

  group by mid,_grp

;

quit;

data want_hash;

  if _n_=1 then do;

    if 0 then set have;

      declare hash h(dataset:'have');

      h.definekey('mid','study','flag');

      h.definedata('mid');

      h.definedone();

   end;

  do _n_=1 by 1 until (last.mid);

   set have;

    by mid;

        if _n_=1 then do;

        _rc1=h.find(key:mid,key:'A', key:1);

        _rc2=h.find(key:mid,key:'B', key:1);

      end;

   if _rc1=0 then can_link=1;

   else if _rc2=0 then do;

     if study in ('B','E') then can_link=1; else can_link=flag;

   end;

   else can_link=flag;

   output;

   end;

   drop _:;

run;

Haikuo

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 324 views
  • 6 likes
  • 4 in conversation