I'm stumped and could use some help coding this logic. I essentially need to do a conditional find and replace across the entire dataset. I've tried using hash tables but the code and output inevitably confuse so I am leaving it out for now. I will add the hash code if requested. The setup:
/* Rules:
IF Study = A and Flag = 1 then Can_Link = 1 for all instances of associated MID
IF Study = B and Flag = 1 then Can_Link = 1 where Study = B or E for all instances of associated MID
Else Can_Link = Flag
*/
data have;
input MID $ RID $ Flag Study $ Can_Link ;
datalines;
1 6 1 A .
1 5 0 B .
2 3 0 A .
2 4 1 A .
2 1 0 B .
3 8 0 C .
4 7 1 A .
5 9 0 D .
6 10 0 E .
6 11 1 B .
6 12 0 C .
;
run;
*output should match this;
data want;
input MID $ RID $ Flag Study $ can_link ;
datalines;
1 6 1 A 1
1 5 0 B 1
2 3 0 A 1
2 4 1 A 1
2 1 0 B 1
3 8 0 C 0
4 7 1 A 1
5 9 0 D 0
6 10 0 E 1
6 11 1 B 1
6 12 0 C 0
;
run;
Edit to fix error pointed out by Tom
Your rules do not account for why CAN_LINK=1 for MID=3.
data want ;
do until(last.mid);
set have;
by mid;
if study='A' and flag=1 then force1=1;
if study='B' and flag=1 then force2=1;
end;
do until(last.mid);
set have;
by mid;
if force1 then can_link=1;
else if force2 and study in ('B','E') then can_link=1;
else can_link=flag ;
put (mid -- can_link) (:);
output;
end;
run;
1 6 1 A 1
1 5 0 B 1
2 3 0 A 1
2 4 1 A 1
2 1 0 B 1
3 8 0 C 0
4 7 1 A 1
5 9 0 D 0
6 10 0 E 1
6 11 1 B 1
6 12 0 C 0
Your rules do not account for why CAN_LINK=1 for MID=3.
data want ;
do until(last.mid);
set have;
by mid;
if study='A' and flag=1 then force1=1;
if study='B' and flag=1 then force2=1;
end;
do until(last.mid);
set have;
by mid;
if force1 then can_link=1;
else if force2 and study in ('B','E') then can_link=1;
else can_link=flag ;
put (mid -- can_link) (:);
output;
end;
run;
1 6 1 A 1
1 5 0 B 1
2 3 0 A 1
2 4 1 A 1
2 1 0 B 1
3 8 0 C 0
4 7 1 A 1
5 9 0 D 0
6 10 0 E 1
6 11 1 B 1
6 12 0 C 0
@Tom I have a question regarding how SAS processes the by groups in your code.
In the first loop, are rows that meet the conditions the only ones flagged in the force columns? Or, does SAS check if any MIDs of the group meet the condition (e.g., study='A' and flag=1) and then apply the rule (e.g., force1 = 1) to all rows within that group?
I looked for documentation on how SAS processes conditional with respect to groups but didn't see an answer.
Because the SET statements are nested inside the DO loops the data step will only restart once per MID value. At the top of the data step the values will be set to missing and during the first DO loop the values might get set to 1 (true). The values for FORCE1 and FORCE2 set during the first DO loop will be constant during the second DO loop.
Note that this would not work if those variables existed in the input dataset because the values would change each time a record was read from the dataset by one of the SET statements.
According to your rules, MID=3 with Study="C" should have Can_Link = Flag = 0.
Your rules can be coded as :
proc sql;
create table want as
select
X.MID, X.RID, X.Flag, X.Study,
case
when exists (select * from have where MID=X.MID and Study="A" and flag=1) then 1
when exists (select * from have where MID=X.MID and Study="B" and flag=1) and Study in ("B","E") then 1
else Flag
end as Can_Link
from have as X;
quit;
PG
Thank you Tom and PGStats! Both work but I went with Tom's solution because the actual dataset has ~130,000 observations and the subqueries are performance killers.
The more, the merrier:
data have;
input MID $ RID $ Flag Study $ ;
datalines;
1 6 1 A
1 5 0 B
2 3 0 A
2 4 1 A
2 1 0 B
3 8 0 C
4 7 1 A
5 9 0 D
6 10 0 E
6 11 1 B
6 12 0 C
;
run;
proc sql;
create table want_sql as
select MID, RID , Flag, Study, case when sum(study='B' and flag=1)>0 then 1 else _link end as can_link from
(select *, case when sum(study='A' and Flag=1)>0 then 1
else flag
end as _Link,
study in ('B','E') as _grp
from have
group by mid)
group by mid,_grp
;
quit;
data want_hash;
if _n_=1 then do;
if 0 then set have;
declare hash h(dataset:'have');
h.definekey('mid','study','flag');
h.definedata('mid');
h.definedone();
end;
do _n_=1 by 1 until (last.mid);
set have;
by mid;
if _n_=1 then do;
_rc1=h.find(key:mid,key:'A', key:1);
_rc2=h.find(key:mid,key:'B', key:1);
end;
if _rc1=0 then can_link=1;
else if _rc2=0 then do;
if study in ('B','E') then can_link=1; else can_link=flag;
end;
else can_link=flag;
output;
end;
drop _:;
run;
Haikuo
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.