Hi, all
I have 2 datasets, sample datasets below:
data1 :
id date sequence
234123 2012-02-04 1
234123 2012-01-26 2
234124 2012-02-24 1
234124 2015-05-05 2
data2:
id startdate enddate sequence
234123 2010-01-01 2012-01-25 1
234123 2012-01-28 2012-03-05 2
234123 2012-02-14 2012-05-07 3
234124 2012-01-25 2012-02-25 1
234124 2012-02-23 2012-03-24 2
234124 2015-06-24 2015-07-25 3
For the same id, I want to calculate if its date is within any corresponding start date and end date, if the date is within, then give the flag=yes, else flag=no. Each date should have one flag. The final results are below:
id date sequence flag
234123 2012-02-04 1 yes
234123 2012-01-26 2 no
234124 2012-02-24 1 yes
234124 2015-05-05 2 no
Thank you for any help!
data data1 ;
input id date :yymmdd10. sequence;
format date yymmdd10. ;
cards;
234123 2012-02-04 1
234123 2012-01-26 2
234124 2012-02-24 1
234124 2015-05-05 2
;
data data2;
input id startdate :yymmdd10. enddate :yymmdd10. sequence ;
format startdate enddate yymmdd10. ;
cards;
234123 2010-01-01 2012-01-25 1
234123 2012-01-28 2012-03-05 2
234123 2012-02-14 2012-05-07 3
234124 2012-01-25 2012-02-25 1
234124 2012-02-23 2012-03-24 2
234124 2015-06-24 2015-07-25 3
;
data temp;
set data2;
do date=startdate to enddate;
output;
end;
keep id date;
run;
data want;
if _n_=1 then do;
declare hash h(dataset:'temp',hashexp:20);
h.definekey('id','date');
h.definedone();
end;
set data1;
flag=ifc( h.check()=0 ,'yes','no ');
run;
@t1ilie wrote:
For the same ID, each unique date should be calculated, so I give the sequence.
So, how should "sequence" be used in this task?
Does below work for you?
data have_date;
input id date:yymmdd10.;
format date date9.;
datalines;
234123 2012-02-04
234123 2012-01-26
234124 2012-02-24
234124 2015-05-05
;
data have_daterange;
input id startdate:yymmdd10. enddate:yymmdd10.;
format startdate enddate date9.;
datalines;
234123 2010-01-01 2012-01-25
234123 2012-01-28 2012-03-05
234123 2012-02-14 2012-05-07
234124 2012-01-25 2012-02-25
234124 2012-02-23 2012-03-24
234124 2015-06-24 2015-07-25
;
data want;
if _n_=1 then
do;
if 0 then set have_daterange;
dcl hash h1(dataset:'have_daterange', multidata:'y');
h1.defineKey('id');
h1.defineData('startdate','enddate');
h1.defineDone();
drop startdate enddate;
end;
call missing(of _all_);
set have_date;
inrange_flg=0;
do while(h1.do_over() eq 0);
if startdate<=date<=enddate then
do;
inrange_flg=1;
leave;
end;
end;
run;
...and it would help us help you if you could provide sample data via a working data step as done in above code as this then gives us the time to deal with your question instead of preparing the sample data.
Assuming that id and sequence in data1 should be used to identify the ranges to check in data2:
data want;
set have;
if _n_ = 1 then do;
if 0 then set work.sequences;
declare hash s(dataset: 'work.sequences', multidata: 'yes');
s.defineKey('id', 'sequence');
s.defineData('startdate', 'enddate');
s.defineDone();
end;
rc = s.find();
flag = 0;
do while (rc = 0);
flag = (startdate <= date <= enddate) or flag;
rc = not (s.find_next() = 0 and not flag);
end;
drop rc;
run;
have = data1
sequences = data2
data data1 ;
input id date :yymmdd10. sequence;
format date yymmdd10. ;
cards;
234123 2012-02-04 1
234123 2012-01-26 2
234124 2012-02-24 1
234124 2015-05-05 2
;
data data2;
input id startdate :yymmdd10. enddate :yymmdd10. sequence ;
format startdate enddate yymmdd10. ;
cards;
234123 2010-01-01 2012-01-25 1
234123 2012-01-28 2012-03-05 2
234123 2012-02-14 2012-05-07 3
234124 2012-01-25 2012-02-25 1
234124 2012-02-23 2012-03-24 2
234124 2015-06-24 2015-07-25 3
;
data temp;
set data2;
do date=startdate to enddate;
output;
end;
keep id date;
run;
data want;
if _n_=1 then do;
declare hash h(dataset:'temp',hashexp:20);
h.definekey('id','date');
h.definedone();
end;
set data1;
flag=ifc( h.check()=0 ,'yes','no ');
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.