Adding flag column when condition is met

Accepted Solution Solved
Reply
Contributor
Posts: 35
Accepted Solution

Adding flag column when condition is met

Dear All,

I have dataset db1 including variables: id, start_date, end_date, and V1.

data db1;

     input id    start_date     end_date     V1;

datalines;

101     20130126     20130814     19     

101     20130729     20130829     77

102     20130201     20130401     14

103     20130217     20130225     11

104     20130102     20131119     47

;

I also have dataset db2 including variables: id, date, and V2.


data db2;

     input id     date     V2;

datalines;

101     20130719     26     

103     20130224     55

105     20131107     10

;


I would like to create a variable FLAG in the dataset db1 such that FLAG = 1 if db1.id = db2.id and db2.date >= db1.start_date and db2.date <= db1.end_date. That is, I would like to obtain the following


data want;

     input id    start_date     end_date     V1     flag;

datalines;

101     20130126     20130814     19     1     

101     20130729     20130829     77     .

102     20130201     20130401     14     .    

103     20130217     20130225     11     1

104     20130102     20131119     47     .

;

My attempt involves PROC SQL and I'm able to isolate the desired records. However, I'm not able to attach a flag in the original dataset db1


proc sql;

  create table want

  as select db1.*

  from db1, db2

  where db1.id = db2.id and db2.date >= db1.start_date and db2.date <= db2.end_date;

quit;

Any help would be highly appreciated.



Accepted Solutions
Solution
‎09-26-2014 09:23 AM
Super User
Posts: 9,681

Re: Adding flag column when condition is met

1)Hash Table




data db1;
     input id    start_date     end_date     V1;
datalines;
101     20130126     20130814     19     
101     20130729     20130829     77
102     20130201     20130401     14
103     20130217     20130225     11
104     20130102     20131119     47
;
data db2;
     input id     date     V2;
datalines;
101     20130719     26     
103     20130224     55
105     20131107     10
;
run;

data want;
 if _n_ = 1 then do;
 if 0 then set db2;
 declare hash ha(dataset:'db2',hashexp:20);
 ha.definekey('id');
 ha.definedata('date');
 ha.definedone();
end;
set db1;
if ha.find()=0 then flag=ifn( start_date le date le end_date,1,. );
run;











2)Or if dataset have already been sorted , Merge statement 




data w;
 merge db1(in=ina) db2;
 by id;
flag=ifn( start_date le date le end_date,1,. );
if ina;
run;

Xia Keshan

Message was edited by: xia keshan

View solution in original post


All Replies
Respected Advisor
Posts: 3,124

Re: Adding flag column when condition is met

proc sql;

create table want as

  select *, case exists (select * from db2 where id=a.id and date between a.start_date and a.end_date) when 1 then 1 else . end as flag

  from db1 a

  ;

quit;

Super User
Super User
Posts: 7,401

Re: Adding flag column when condition is met

Hi,

proc sql;

     create table WANT as

     select     DB1.*,

                   case when DB2.ID is not null then 1 else . end as FLAG

     from       DB1 DB1

     left join   DB1 DB2    

     on          DB1.ID=DB2.ID

     and        DB1.START_DATE <= DB2.DATE <= DB1.END_DATE;

quit;

Trusted Advisor
Posts: 1,204

Re: Adding flag column when condition is met

proc sql;

select db1.*,case when db2.date >= db1.start_date and db2.date <= db1.end_date then 1

end as flag

  from db1 left join db2

  on db1.id = db2.id;

quit;

Contributor
Posts: 35

Re: Adding flag column when condition is met

, , @stat@sas, thank you for your replies.

I was wondering whether it is possible to gain a bit in efficiency because my dataset db1 has >30M records and db2 around 0.3 M. It's going to take days in this way (still running...)

Maybe knowing db1 is sorted on id and date and db2 is sorted on id and start_date, SAS could stop checking the condition for remaining records in db2 when db2.id > db1.id.

Super User
Super User
Posts: 7,401

Re: Adding flag column when condition is met

I'm not thinking of an easy way.  Whichever way you look at it your going to have to join the two datasets at some point.  There are many tips and tricks on here to speed up things, e.g. break your 30m records down into smaller datasets (maybe on id or something).  You can use indexes, get distinct list of ids which match and only do the check on those etc.  However at the end of the 30m records is going to take a while regardless.  Maybe also evaluate if adding a flag to the big dataset is the best way forward.

Super Contributor
Posts: 305

Re: Adding flag column when condition is met

Hello,

proc sql;

  create table want

  as select db1.*,

  case

  when db1.id = db2.id and db2.date >= db1.start_date and db2.date <= db1.end_date then 1

  else .

  end as flag

  from db1 left join db2 on db1.id = db2.id;

quit;

Solution
‎09-26-2014 09:23 AM
Super User
Posts: 9,681

Re: Adding flag column when condition is met

1)Hash Table




data db1;
     input id    start_date     end_date     V1;
datalines;
101     20130126     20130814     19     
101     20130729     20130829     77
102     20130201     20130401     14
103     20130217     20130225     11
104     20130102     20131119     47
;
data db2;
     input id     date     V2;
datalines;
101     20130719     26     
103     20130224     55
105     20131107     10
;
run;

data want;
 if _n_ = 1 then do;
 if 0 then set db2;
 declare hash ha(dataset:'db2',hashexp:20);
 ha.definekey('id');
 ha.definedata('date');
 ha.definedone();
end;
set db1;
if ha.find()=0 then flag=ifn( start_date le date le end_date,1,. );
run;











2)Or if dataset have already been sorted , Merge statement 




data w;
 merge db1(in=ina) db2;
 by id;
flag=ifn( start_date le date le end_date,1,. );
if ina;
run;

Xia Keshan

Message was edited by: xia keshan

Contributor
Posts: 35

Re: Adding flag column when condition is met

Dear , your solutions are super fast and efficient. Thank you for your precious help.

May I ask you one more thing? My dataset db2 doesn't have unique id as in the toy example. That is, suppose I have the following (just added the 2nd row of db2, so that db2 doesn't have unique ids now):

data db1;

     input id    start_date     end_date     V1;

datalines;

101     20130126     20130814     19   

101     20130729     20130829     77

102     20130201     20130401     14

103     20130217     20130225     11

104     20130102     20131119     47

;

data db2;

     input id     date     V2;

datalines;

101     20130719     26

101     20130730     31

103     20130224     55

105     20131107     10

;

run;

This is the result I get right now: the flag is correctly assigned to the 1st record of db1, but not to the 2nd row of db1. That is,


data want;

     input id    start_date     end_date     V1     flag;

datalines;

101     20130126     20130814     19     1    

101     20130729     20130829     77     .

102     20130201     20130401     14     .   

103     20130217     20130225     11     1

104     20130102     20131119     47     .

;


I would like to obtain the following:

data want;

     input id    start_date     end_date     V1     flag;

datalines;

101     20130126     20130814     19     1    

101     20130729     20130829     77     1

102     20130201     20130401     14     .   

103     20130217     20130225     11     1

104     20130102     20131119     47     .

;


Please note the flag 1 in the second row this time.


Thank you for your help.

Respected Advisor
Posts: 3,124

Re: Adding flag column when condition is met

A little tweak on 's code should do:

data want;

if _n_ = 1 then do;

if 0 then set db2;

declare hash ha(dataset:'db2',hashexp:20, multidata:'y');

ha.definekey('id');

ha.definedata('date');

ha.definedone();

end;

set db1;

rc=ha.find();

if rc ne 0 then flag=.;

else

do  while (rc=0);

flag=ifn( start_date le date le end_date,1,flag);

rc=ha.find_next();

end;

drop rc;

run;

Haikuo

Contributor
Posts: 35

Re: Adding flag column when condition is met

Dear , thank you so much for your help!!

Super User
Posts: 9,681

Re: Adding flag column when condition is met

Hi Mark,

HaiKuo has already done it for me .

data db1;
     input id    start_date     end_date     V1;
datalines;
101     20130126     20130814     19     
101     20130729     20130829     77
102     20130201     20130401     14
103     20130217     20130225     11
104     20130102     20131119     47
;
data db2;
     input id     date     V2;
datalines;
101     20130719     26
101     20130730     31
103     20130224     55
105     20131107     10
;
run;

data want;
 if _n_ = 1 then do;
 if 0 then set db2;
 declare hash ha(dataset:'db2',hashexp:20,multidata:'y');
 ha.definekey('id');
 ha.definedata('date');
 ha.definedone();
end;
set db1;
rc=ha.find();
do while(rc=0);
  if start_date le date le end_date then flag=1;
  rc=ha.find_next();
end;
run;



Xia Keshan

Contributor
Posts: 35

Re: Adding flag column when condition is met

Thank you guys for all your help.

Respected Advisor
Posts: 3,124

Re: Adding flag column when condition is met

If 'id' in db2 is unique, and both tables are presorted by 'id', then the following will give you a boost on efficiency:

data db1;

input id    start_date     end_date     V1;

datalines;

101     20130126     20130814     19

101     20130729     20130829     77

102     20130201     20130401     14

103     20130217     20130225     11

104     20130102     20131119     47

;

data db2;

input id     date     V2;

datalines;

101     20130719     26

103     20130224     55

105     20131107     10

;

data want;

merge db1 db2(drop=v2);

by id;

     if START_DATE <= DATE <= END_DATE then flag=1;

run;

Well, if 'id' is not unique in db2, then we may need to resort to Hash object.

Good Luck,

Haikuo

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 5123 views
  • 6 likes
  • 6 in conversation