Thank you!
data small;
format date1 mmddyy10.;
input id date1 mmddyy10.;
cards;
1 08/08/2000
1 08/08/2007
1 08/08/2012
;
data big;
format date2 mmddyy10.;
input id date2 mmddyy10.;
cards;
1 07/08/2000
1 08/08/2002
1 09/08/2004
1 10/08/2007
1 10/08/2009
1 10/08/2011
1 08/08/2012
;
data temp;
set small;
by id;
if first.id then count=0;
count+1;
run;
data after3(drop=count date1) select3;
merge big temp(where=(count=3));
by id ;
if date2-date1>-90 then output select3;
else output after3;
data after2(drop=count date1) select2;
merge after3 temp(where=(count=2) );
by id ;
if date2-date1>-90 then output select2;
else output after2;
data after1(drop=count date1) select1;
merge after2 temp(where=(count=1));
by id ;
if date2-date1>-90 then output select1;
else output after1;
run;
data final(drop=count);
retain id date1 date2;
set select1 select2 select3;
by id;
run;
proc print;run;
Hi Linlin, This gives you the answer you wanted:
proc sort data=small; by id descending date1; run;
data smallNext;
format lastDate1 firstDate1 yymmdd10.;
lastDate1 = '31DEC2200'd; /* The end of time... */
do until(last.id);
set small; by id;
firstDate1 = intnx("DAY", date1, -90);
output;
lastDate1 = intnx("DAY", date1, -1);
end;
run;
proc sql;
create table want as
select
S.id,
S.date1,
B.date2
from
smallNext as S inner join
big as B on
S.id = B.id and
B.date2 between S.firstDate1 and S.lastDate1
order by id, date1, date2;
drop table smallNext;
quit;
PG
What about:
data Hash_Source id_grp_cnt(keep=id count);
set small;
by id date1;
if first.id then
count=1;
else count+1;
output Hash_Source;
if last.id then
output id_grp_cnt;
run;
data want;
if _n_=1 then
do;
if 0 then
set temp;
dcl hash h1 (dataset:'Hash_Source',ordered:'y');
_rc=h1.defineKey('id','count');
_rc=h1.defineData('date1');
_rc=h1.defineDone();
end;
merge big id_grp_cnt;
by id;
do _i=count to 1 by -1;
if h1.find(key:id,key:_i) =0 then
do;
if date2-date1>-90 then
do;
output;
leave;
end;
end;
end;
run;
...and this would be the "short code" version - but I assume that's not really what you had in mind.
proc sql;
create table want2 as
select s.id, s.date1, b.date2
from small s, big b
where b.date2-s.date1>-90
group by s.id, b.date2
having b.date2-s.date1 = min(b.date2-s.date1)
;
quit;
Linlin,
I am trying to reverse engineer your example. Ia ma seeing:
- I small dataset with dates.
Could be optimized by using formats or hashing when needing a table lookup
- A big dataset that is summarized by id's
Proc summary/means could be used to do that job.
With in-database processing there is no need for sorting I assume. When it is ordered but not sorted it will also work.
- The big dataset is merged with it is summarized by id's
And then I get lost as subsetting the dataset in several steps on count....
... I got lost there on the intention... Please explain.
- The resulted selections are concatenated again
Why losing the information of how it was constructed?
Linlin,
I think you'll find this much easier to work with if you transpose the second data set. So start with DATE in both data sets (not DATE1 and DATE2). Then transpose the dates in the larger data BY ID. With one observation per ID being merged in from the larger data set, the rest should become much easier.
Good luck.
Hi Linlin, This gives you the answer you wanted:
proc sort data=small; by id descending date1; run;
data smallNext;
format lastDate1 firstDate1 yymmdd10.;
lastDate1 = '31DEC2200'd; /* The end of time... */
do until(last.id);
set small; by id;
firstDate1 = intnx("DAY", date1, -90);
output;
lastDate1 = intnx("DAY", date1, -1);
end;
run;
proc sql;
create table want as
select
S.id,
S.date1,
B.date2
from
smallNext as S inner join
big as B on
S.id = B.id and
B.date2 between S.firstDate1 and S.lastDate1
order by id, date1, date2;
drop table smallNext;
quit;
PG
Hi Patrick,
Your hash solution is too diffcult for me:smileysilly:. I like your Sql solution (I changed
having b.date2-s.date1 = min(b.date2-s.date1) to
having b.date2-s.date1 = min(b.date2-s.date1) and b.id=s.id
. Thank you !
Astounding, I will try to come up a solution as you suggested.
PG, I tried to rewrite your code to make sure I understand completely. I got the same result even without the ORDER statement. I wonder why :smileyconfused:.
data small;
format date1 mmddyy10.;
input id date1 mmddyy10.;
cards;
1 08/08/2000
1 08/08/2007
1 08/08/2012
2 07/05/2002
2 09/09/2004
;
data big;
format date2 mmddyy10.;
input id date2 mmddyy10.;
cards;
1 07/08/2000
1 08/08/2002
1 09/08/2004
1 10/08/2007
1 10/08/2009
1 10/08/2011
1 08/08/2012
2 06/05/2002
2 08/07/2003
2 09/09/2004
2 08/20/2005
;
run;
proc sort data=small; by id descending date1;
data snext;
format lastdate1 firstdate1 mmddyy10.;
lastdate1='01jan2020'd;
do until (last.id);
set small;
by id;
firstdate1=intnx('month',date1,-3,'s');
output;
lastdate1=intnx('day',date1,-1);
end;
run;
proc sql;
create table want as select s.id,s.date1,b.date2
from snext as s
inner join
big as b
on s.id=b.id and (b.date2 between firstdate1 and lastdate1);
quit;
Message was edited by: Linlin
At the base, SQL queries are set theory operations, they don't expect your data to be sorted, nor do they guarantee the order of the result, unless you request it explicitly with an ORDER BY clause. So, removing the ORDER BY clause gave the same result, but that order is not guaranteed. If you are looking for other removable statements, you could get rid of the format statement in snext, it was there only for debugging purposes.
PG
Hi, Ma'am Linlin,
You are Master and I am newbie.
Need to learn from you.
I recently post a request about creating an application of clicks, manual, dialoges....
Maybe Ma'am could help greatly.
I am Chinese proper. An old guy.
How about Ma'am?
My code is shorter than PG.
data small; format date1 mmddyy10.; input id date1 mmddyy10.; cards; 1 08/08/2000 1 08/08/2007 1 08/08/2012 ; data big; format date2 mmddyy10.; input id date2 mmddyy10.; cards; 1 07/08/2000 1 08/08/2002 1 09/08/2004 1 10/08/2007 1 10/08/2009 1 10/08/2011 1 08/08/2012 ; run; data temp(rename=(date=date2)); set small(rename=(date1=date) in=ina) big(rename=(date2=date) in=inb); by id date; retain _date; if ina then _date=date; if inb; format _date mmddyy10.; run; proc sort data=temp;by id descending _date;run; data want(drop=_date); retain date1; set temp; if not missing(_date) then date1=_date; format date1 mmddyy10.; run; proc sort data=want;by id date1 date2;run;
Xia Keshan
hi, xia,
this is bai. missed Beijing very much.
?? Why ? Are you Chinese guy ? I missed U.S. as well.
Xia Keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.