BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
kbk
Fluorite | Level 6 kbk
Fluorite | Level 6

I'm stumped and could use some help coding this logic. I essentially need to do a conditional find and replace across the entire dataset. I've tried using hash tables but the code and output inevitably confuse so I am leaving it out for now. I will add the hash code if requested. The setup:

/* Rules:

IF Study = A and Flag = 1 then Can_Link = 1 for all instances of associated MID

IF Study = B and Flag = 1 then Can_Link = 1 where Study = B or E for all instances of associated MID

Else Can_Link = Flag

*/

data have;

input MID $ RID $ Flag Study $ Can_Link ;

datalines;                     

1 6 1 A .

1 5 0 B .

2 3 0 A .

2 4 1 A .

2 1 0 B .

3 8 0 C .

4 7 1 A .

5 9 0 D .

6 10 0 E .

6 11 1 B .

6 12 0 C .

;

run;

*output should match this;

data want;

input MID $ RID $ Flag Study $ can_link ;

datalines;                     

1 6 1 A 1

1 5 0 B 1

2 3 0 A 1

2 4 1 A 1

2 1 0 B 1

3 8 0 C 0

4 7 1 A 1

5 9 0 D 0

6 10 0 E 1

6 11 1 B 1

6 12 0 C 0

;

run;



Edit to fix error pointed out by Tom

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Your rules do not account for why CAN_LINK=1 for MID=3.

data want ;

  do until(last.mid);

    set have;

    by mid;

    if study='A' and flag=1 then force1=1;

    if study='B' and flag=1 then force2=1;

  end;

  do until(last.mid);

    set have;

    by mid;

    if force1 then can_link=1;

    else if force2 and study in ('B','E') then can_link=1;

    else can_link=flag ;

    put (mid -- can_link) (:);

    output;

  end;

run;

1 6 1 A 1

1 5 0 B 1

2 3 0 A 1

2 4 1 A 1

2 1 0 B 1

3 8 0 C 0

4 7 1 A 1

5 9 0 D 0

6 10 0 E 1

6 11 1 B 1

6 12 0 C 0

View solution in original post

6 REPLIES 6
Tom
Super User Tom
Super User

Your rules do not account for why CAN_LINK=1 for MID=3.

data want ;

  do until(last.mid);

    set have;

    by mid;

    if study='A' and flag=1 then force1=1;

    if study='B' and flag=1 then force2=1;

  end;

  do until(last.mid);

    set have;

    by mid;

    if force1 then can_link=1;

    else if force2 and study in ('B','E') then can_link=1;

    else can_link=flag ;

    put (mid -- can_link) (:);

    output;

  end;

run;

1 6 1 A 1

1 5 0 B 1

2 3 0 A 1

2 4 1 A 1

2 1 0 B 1

3 8 0 C 0

4 7 1 A 1

5 9 0 D 0

6 10 0 E 1

6 11 1 B 1

6 12 0 C 0

kbk
Fluorite | Level 6 kbk
Fluorite | Level 6

@Tom I have a question regarding how SAS processes the by groups in your code.

In the first loop, are rows that meet the conditions the only ones flagged in the force columns?  Or, does SAS check if any MIDs of the group meet the condition (e.g., study='A' and flag=1) and then apply the rule (e.g., force1 = 1) to all rows within that group?

I looked for documentation on how SAS processes conditional with respect to groups but didn't see an answer.

Tom
Super User Tom
Super User

Because the SET statements are nested inside the DO loops the data step will only restart once per MID value. At the top of the data step the values will be set to missing and during the first DO loop the values might get set to 1 (true).  The values for FORCE1 and FORCE2 set during the first DO loop will be constant during the second DO loop.

Note that this would not work if those variables existed in the input dataset because the values would change each time a record was read from the dataset by one of the SET statements.

PGStats
Opal | Level 21

According to your rules, MID=3  with Study="C" should have Can_Link = Flag = 0.

Your rules can be coded as :

proc sql;

create table want as

select

     X.MID, X.RID, X.Flag, X.Study,

     case

          when exists (select * from have where MID=X.MID and Study="A" and flag=1) then 1

          when exists (select * from have where MID=X.MID and Study="B" and flag=1) and Study in ("B","E") then 1

          else Flag

          end as Can_Link

from have as X;

quit;

PG

PG
kbk
Fluorite | Level 6 kbk
Fluorite | Level 6

Thank you Tom and PGStats! Both work but I went with Tom's solution because the actual dataset has ~130,000 observations and the subqueries are performance killers.

Haikuo
Onyx | Level 15

The more, the merrier:

data have;

input MID $ RID $ Flag Study $ ;

datalines;                     

1 6 1 A

1 5 0 B

2 3 0 A

2 4 1 A

2 1 0 B

3 8 0 C

4 7 1 A

5 9 0 D

6 10 0 E

6 11 1 B

6 12 0 C

;

run;

proc sql;

create table want_sql as

select MID, RID , Flag, Study, case when sum(study='B' and flag=1)>0 then 1 else _link end as can_link from

   (select *, case when sum(study='A' and Flag=1)>0 then 1

                  else flag

             end as _Link,

                   study in ('B','E') as _grp

  from have

  group by mid)

  group by mid,_grp

;

quit;

data want_hash;

  if _n_=1 then do;

    if 0 then set have;

      declare hash h(dataset:'have');

      h.definekey('mid','study','flag');

      h.definedata('mid');

      h.definedone();

   end;

  do _n_=1 by 1 until (last.mid);

   set have;

    by mid;

        if _n_=1 then do;

        _rc1=h.find(key:mid,key:'A', key:1);

        _rc2=h.find(key:mid,key:'B', key:1);

      end;

   if _rc1=0 then can_link=1;

   else if _rc2=0 then do;

     if study in ('B','E') then can_link=1; else can_link=flag;

   end;

   else can_link=flag;

   output;

   end;

   drop _:;

run;

Haikuo

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1321 views
  • 6 likes
  • 4 in conversation