Solved: Re: Complicated conditional recodes

kbk · Posted 09-09-2013 11:03 AM

I'm stumped and could use some help coding this logic. I essentially need to do a conditional find and replace across the entire dataset. I've tried using hash tables but the code and output inevitably confuse so I am leaving it out for now. I will add the hash code if requested. The setup:

/* Rules:

IF Study = A and Flag = 1 then Can_Link = 1 for all instances of associated MID

IF Study = B and Flag = 1 then Can_Link = 1 where Study = B or E for all instances of associated MID

Else Can_Link = Flag

*/

data have;

input MID $ RID $ Flag Study $ Can_Link ;

datalines;

1 6 1 A .

1 5 0 B .

2 3 0 A .

2 4 1 A .

2 1 0 B .

3 8 0 C .

4 7 1 A .

5 9 0 D .

6 10 0 E .

6 11 1 B .

6 12 0 C .

;

run;

*output should match this;

data want;

input MID $ RID $ Flag Study $ can_link ;

datalines;

1 6 1 A 1

1 5 0 B 1

2 3 0 A 1

2 4 1 A 1

2 1 0 B 1

3 8 0 C 0

4 7 1 A 1

5 9 0 D 0

6 10 0 E 1

6 11 1 B 1

6 12 0 C 0

;

run;

Edit to fix error pointed out by Tom

Tom · Posted 09-09-2013 11:24 AM

Your rules do not account for why CAN_LINK=1 for MID=3.

data want ;

do until(last.mid);

set have;

by mid;

if study='A' and flag=1 then force1=1;

if study='B' and flag=1 then force2=1;

end;

do until(last.mid);

set have;

by mid;

if force1 then can_link=1;

else if force2 and study in ('B','E') then can_link=1;

else can_link=flag ;

put (mid -- can_link) (:);

output;

end;

run;

1 6 1 A 1

1 5 0 B 1

2 3 0 A 1

2 4 1 A 1

2 1 0 B 1

3 8 0 C 0

4 7 1 A 1

5 9 0 D 0

6 10 0 E 1

6 11 1 B 1

6 12 0 C 0

View solution in original post

Tom · Posted 09-09-2013 11:24 AM

Your rules do not account for why CAN_LINK=1 for MID=3.

data want ;

do until(last.mid);

set have;

by mid;

if study='A' and flag=1 then force1=1;

if study='B' and flag=1 then force2=1;

end;

do until(last.mid);

set have;

by mid;

if force1 then can_link=1;

else if force2 and study in ('B','E') then can_link=1;

else can_link=flag ;

put (mid -- can_link) (:);

output;

end;

run;

1 6 1 A 1

1 5 0 B 1

2 3 0 A 1

2 4 1 A 1

2 1 0 B 1

3 8 0 C 0

4 7 1 A 1

5 9 0 D 0

6 10 0 E 1

6 11 1 B 1

6 12 0 C 0

kbk · Posted 09-18-2013 03:54 PM

@Tom I have a question regarding how SAS processes the by groups in your code.

In the first loop, are rows that meet the conditions the only ones flagged in the force columns? Or, does SAS check if any MIDs of the group meet the condition (e.g., study='A' and flag=1) and then apply the rule (e.g., force1 = 1) to all rows within that group?

I looked for documentation on how SAS processes conditional with respect to groups but didn't see an answer.

Tom · Posted 09-19-2013 07:06 AM

Because the SET statements are nested inside the DO loops the data step will only restart once per MID value. At the top of the data step the values will be set to missing and during the first DO loop the values might get set to 1 (true). The values for FORCE1 and FORCE2 set during the first DO loop will be constant during the second DO loop.

Note that this would not work if those variables existed in the input dataset because the values would change each time a record was read from the dataset by one of the SET statements.

PGStats · Posted 09-09-2013 11:43 AM

According to your rules, MID=3 with Study="C" should have Can_Link = Flag = 0.

Your rules can be coded as :

proc sql;

create table want as

select

X.MID, X.RID, X.Flag, X.Study,

case

when exists (select * from have where MID=X.MID and Study="A" and flag=1) then 1

when exists (select * from have where MID=X.MID and Study="B" and flag=1) and Study in ("B","E") then 1

else Flag

end as Can_Link

from have as X;

quit;

PG

kbk · Posted 09-09-2013 12:18 PM

Thank you Tom and PGStats! Both work but I went with Tom's solution because the actual dataset has ~130,000 observations and the subqueries are performance killers.

Haikuo · Posted 09-09-2013 12:22 PM

The more, the merrier:

data have;

input MID $ RID $ Flag Study $ ;

datalines;

1 6 1 A

1 5 0 B

2 3 0 A

2 4 1 A

2 1 0 B

3 8 0 C

4 7 1 A

5 9 0 D

6 10 0 E

6 11 1 B

6 12 0 C

;

run;

proc sql;

create table want_sql as

select MID, RID , Flag, Study, case when sum(study='B' and flag=1)>0 then 1 else _link end as can_link from

(select *, case when sum(study='A' and Flag=1)>0 then 1

else flag

end as _Link,

study in ('B','E') as _grp

from have

group by mid)

group by mid,_grp

;

quit;

data want_hash;

if _n_=1 then do;

if 0 then set have;

declare hash h(dataset:'have');

h.definekey('mid','study','flag');

h.definedata('mid');

h.definedone();

end;

do _n_=1 by 1 until (last.mid);

set have;

by mid;

if _n_=1 then do;

_rc1=h.find(key:mid,key:'A', key:1);

_rc2=h.find(key:mid,key:'B', key:1);

end;

if _rc1=0 then can_link=1;

else if _rc2=0 then do;

if study in ('B','E') then can_link=1; else can_link=flag;

end;

else can_link=flag;

output;

end;

drop _:;

run;

Haikuo

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away