<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Merge/join using a date from one dataset within a date range in another in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969567#M376876</link>
    <description>&lt;P&gt;This is a good use case for applying conditional SET statements in a data step.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Assuming:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;SET1 is sorted by ID/d_start/b/c/d&lt;/LI&gt;
&lt;LI&gt;SET1 has no instances of overlapping d_start-d_end date ranges&lt;/LI&gt;
&lt;LI&gt;SET2 is sorted by ID/eventstart&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;then&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data set1;
    input id $ d_start :date9. d_end :date9.
          physcat $ a b c d e;
    format d_start d_end date9.;
    datalines;
s001 01JAN2020 31DEC2020 A 1 2 3 1 0
s001 01JAN2020 31DEC2020 A 1 3 3 1 0
s001 01JAN2021 31DEC2021 B 1 3 3 2 1
s002 01JUN2019 31MAY2020 A 2 1 2 1 1
s002 01JUN2020 31MAY2021 C 2 2 2 2 0
s003 01JAN2020 31DEC2022 B 1 3 4 1 0
run;


data set2;
input id $ eventstart :date9.;
format eventstart date9.;
datalines;
s001 15JUN2020
s001 20JUL2021
s002 15AUG2019
s002 01JUL2020
s003 01JAN2021
s003 01JAN2023
run;

data need /view=need  /*Keep lowest B C D for each ID/D_START*/;
  set set1;
  by id d_start b c d;
  if first.d;
run;

data want (drop=_:);
  set need (keep=id d_start in=in1 rename=(d_start=_ref_date))
      set2 (                in=in2 rename=(eventstart=_ref_date));
  by id _ref_date;

  retain _left_sentinel .;
  if in1 then set need ;
  retain _right_sentinel ' ';

  if in2 ;
  set set2;
  if first.id or eventstart&amp;gt;d_end then call missing(of _left_sentinel--_right_sentinel);
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This will also accommodate multiple events within a date range.&lt;/P&gt;</description>
    <pubDate>Mon, 23 Jun 2025 19:23:52 GMT</pubDate>
    <dc:creator>mkeintz</dc:creator>
    <dc:date>2025-06-23T19:23:52Z</dc:date>
    <item>
      <title>Merge/join using a date from one dataset within a date range in another</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969535#M376866</link>
      <description>&lt;P&gt;I have two data sets (set1 set2) with a common id.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data set1;
    input id $ d_start :date9. d_end :date9.
          physcat $ a b c d e;
    format d_start d_end date9.;
    datalines;
s001 01JAN2020 31DEC2020 A 1 2 3 1 0
s001 01JAN2020 31DEC2020 A 1 3 3 1 0

s001 01JAN2021 31DEC2021 B 1 3 3 2 1
s002 01JUN2019 31MAY2020 A 2 1 2 1 1
s002 01JUN2020 31MAY2021 C 2 2 2 2 0
s003 01JAN2020 31DEC2022 B 1 3 4 1 0
;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data set2;
input id $ eventstart :date9.;
format eventstart date9.;
datalines;
s001 15JUN2020
s001 20JUL2021
s002 15AUG2019
s002 01JUL2020
s003 01JAN2021
s003 01JAN2023
;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I am trying to join specific variables from set1 to set2 (physcat, a, b, c, d, e) by id where eventstart falls within a d_start and d_end date range in set1.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;*In the case more than 1 observation from set1 had date ranges that include eventstart, I would want to take the variable values from the observation with the lowest b, c and d. These are temporal categorical variables where lower values would indicate earlier time (i.e. before a birthday or promotion), and they should increase uniformly together or singularly, but not reverse order. In the example above for s001, the first observation has lower values (2 3 1) than the second observation (3 3 1) so I would want the former if eventstart fell in that range. *&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;The desired dataset would look like this:&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data merged;
    input id $ eventstart :date9.
          physcat $ a b c d e;
    format eventstart date9.;
    datalines;
s001 15JUN2020 A 1 2 3 1 0
s001 20JUL2021 B 1 3 3 2 1
s002 15AUG2019 A 2 1 2 1 1
s002 01JUL2020 C 2 2 2 2 0
s003 01JAN2021 B 1 3 4 1 0
s003 01JAN2023 . . . . . .   /* I do not think there should be any cases without matches, but if there were I would want to retain with missing values */
;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;BR /&gt;What is the simplest way to do this? I have tried a few different joins with proc sql but am obviously making mistakes, as they end up producing more observations for an id than exist in set2. Set2 should maintain the same number of observations prior to merge/join.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jun 2025 14:13:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969535#M376866</guid>
      <dc:creator>sasgorilla</dc:creator>
      <dc:date>2025-06-23T14:13:51Z</dc:date>
    </item>
    <item>
      <title>Re: Merge/join using a date from one dataset within a date range in another</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969536#M376867</link>
      <description>&lt;P&gt;Please update your question to show the PROC SQL join you have tried.&amp;nbsp; That will help others help you.&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jun 2025 13:40:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969536#M376867</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2025-06-23T13:40:03Z</dc:date>
    </item>
    <item>
      <title>Re: Merge/join using a date from one dataset within a date range in another</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969537#M376868</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/19879"&gt;@Quentin&lt;/a&gt;&amp;nbsp;Thank you. I was having some formatting issues when copying and pasting code for some reason...&lt;BR /&gt;&lt;BR /&gt;Here is a recent example that resulted in increasing the number of observations in set 2.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
    create table best_matches as
    select 
        s2.id,
        s2.eventstart,
        s1.physcat,
        s1.a,
        s1.b,
        s1.c,
        s1.d,
        s1.e,
        s1.d_end
    from set2 as s2
    inner join set1 as s1
        on s1.id = s2.id
        and s2.eventstart between s1.d_start and s1.d_end
    group by s2.id, s2.eventstart
    having s1.d_end = min(s1.d_end)
    ;

    create table merged as
    select 
        s2.*, 
        bm.physcat,
        bm.a, bm.b, bm.c, bm.d, bm.e
    from set2 as s2
    left join best_matches as bm
        on s2.id = bm.id
        and s2.eventstart = bm.eventstart
    ;
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 23 Jun 2025 13:45:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969537#M376868</guid>
      <dc:creator>sasgorilla</dc:creator>
      <dc:date>2025-06-23T13:45:23Z</dc:date>
    </item>
    <item>
      <title>Re: Merge/join using a date from one dataset within a date range in another</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969541#M376870</link>
      <description>&lt;P&gt;The following gives me your desired table. I created a cartesian join between matching id values and then compared the dates and then selected the first in a BY group.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data set1;
    input id $ d_start :date9. d_end :date9.
          physcat $ a b c d e;
    format d_start d_end date9.;
    datalines;
s001 01JAN2020 31DEC2020 A 1 2 3 1 0
s001 01JAN2020 31DEC2020 A 1 3 3 1 0
s001 01JAN2021 31DEC2021 B 1 3 3 2 1
s002 01JUN2019 31MAY2020 A 2 1 2 1 1
s002 01JUN2020 31MAY2021 C 2 2 2 2 0
s003 01JAN2020 31DEC2022 B 1 3 4 1 0
;
run;

data set2;
input id $ eventstart :date9.;
format eventstart date9.;
datalines;
s001 15JUN2020
s001 20JUL2021
s002 15AUG2019
s002 01JUL2020
s003 01JAN2021
s003 01JAN2023
;
run;

proc sql;
create table combined as 
 select set1.id, set1.d_start, set1.d_end, set1.physcat, set1.a,
  set1.b, set1.c, set1.d, set1.e, set2.eventstart
 from set1, set2
  where set1.id=set2.id;
quit;

data combined;
set combined;
where d_start le eventstart le d_end;
run;

proc sort data=combined;
by id d_start eventstart d_end;
run;

data combined;
set combined;
by id d_start eventstart d_end;
if first.d_end then output;
run;

proc print data=combined;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 23 Jun 2025 15:08:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969541#M376870</guid>
      <dc:creator>Kathryn_SAS</dc:creator>
      <dc:date>2025-06-23T15:08:25Z</dc:date>
    </item>
    <item>
      <title>Re: Merge/join using a date from one dataset within a date range in another</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969543#M376871</link>
      <description>&lt;P&gt;It is not clear out you order by multiple variables.&amp;nbsp; Perhaps just sort by them and take the first one?&lt;/P&gt;
&lt;P&gt;Something like this?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
create table joined as 
  select a.*,b.eventstart 
  from set1 a left join set2 b
  on a.id=b.id 
    and b.eventstart between a.d_start and a.d_end
  order by a.id,b.eventstart,a.physcat,a.a,a.b,a.c,a.d
;
quit;

data want;
  set joined;
  by id eventstart physcat a ;
  if first.a;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Tom_0-1750692510303.png" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/107955iE8B753BF007C8E28/image-size/large?v=v2&amp;amp;px=999" role="button" title="Tom_0-1750692510303.png" alt="Tom_0-1750692510303.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jun 2025 15:28:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969543#M376871</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2025-06-23T15:28:44Z</dc:date>
    </item>
    <item>
      <title>Re: Merge/join using a date from one dataset within a date range in another</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969548#M376872</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/458102"&gt;@sasgorilla&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would probably create a temporary variable the sort order of which allows for using a HAVING clause selecting the minimum (or maximum) value of that variable (as a "tie-breaker"). So the new variable would reflect your criterion "&lt;SPAN&gt;the observation with the lowest b, c and d."&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Here is an example using &lt;FONT face="courier new,courier"&gt;_x=b+c+d&lt;/FONT&gt; for that purpose:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;FONT color="#999999"&gt;proc sql;
create table merged&lt;FONT color="#000000"&gt;&lt;STRONG&gt;(drop=_x)&lt;/STRONG&gt;&lt;/FONT&gt; as
select s2.id, eventstart,
       physcat, a, b, c, d, e,
       &lt;FONT color="#000000"&gt;&lt;STRONG&gt;b+c+d as _x&lt;/STRONG&gt;&lt;/FONT&gt;
from set2 as s2
left join set1 as s1
on s1.id = s2.id and eventstart between d_start and d_end
group by s2.id, eventstart
&lt;FONT color="#000000"&gt;&lt;STRONG&gt;having _x=min(_x);&lt;/STRONG&gt;&lt;/FONT&gt;
quit;&lt;/FONT&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;SPAN&gt;If your interpretation of "the lowest b, c and d" is rather hierarchical, i.e., lowest b is most important, followed by lowest c (if the values of b are equal) and so on, you could define&amp;nbsp;&lt;FONT face="courier new,courier"&gt;_x&lt;/FONT&gt; as something like&amp;nbsp;&lt;FONT face="courier new,courier"&gt;10000*b+100*c+d&lt;/FONT&gt; (assuming non-missing integer values b, c, d between 0 and 99). And many more definitions are possible (e.g., adding criteria for missing values).&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jun 2025 15:53:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969548#M376872</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2025-06-23T15:53:38Z</dc:date>
    </item>
    <item>
      <title>Re: Merge/join using a date from one dataset within a date range in another</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969549#M376873</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13770"&gt;@Kathryn_SAS&lt;/a&gt;&amp;nbsp;for the quick response!&lt;BR /&gt;&lt;BR /&gt;I'm testing if this worked...&amp;nbsp; I lose a lot of observations in set2 when I run your code, but it appears I may have a lot of set2 eventstarts that are outside of the d_start d_end ranges. I did not know that to be the case so I'm working through that now to test.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jun 2025 15:55:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969549#M376873</guid>
      <dc:creator>sasgorilla</dc:creator>
      <dc:date>2025-06-23T15:55:07Z</dc:date>
    </item>
    <item>
      <title>Re: Merge/join using a date from one dataset within a date range in another</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969550#M376874</link>
      <description>&lt;P&gt;Thanks&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/32733"&gt;@FreelanceReinh&lt;/a&gt;&amp;nbsp;, that is a great idea! I will test.&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;I also found there may be a pre-categorized age variable in set1. It's not a DOB or that would be easy to sort by, but an age at differing arbitray period starts. I'm trying to test that out since sorting by that may work as well.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jun 2025 15:57:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969550#M376874</guid>
      <dc:creator>sasgorilla</dc:creator>
      <dc:date>2025-06-23T15:57:17Z</dc:date>
    </item>
    <item>
      <title>Re: Merge/join using a date from one dataset within a date range in another</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969567#M376876</link>
      <description>&lt;P&gt;This is a good use case for applying conditional SET statements in a data step.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Assuming:&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;SET1 is sorted by ID/d_start/b/c/d&lt;/LI&gt;
&lt;LI&gt;SET1 has no instances of overlapping d_start-d_end date ranges&lt;/LI&gt;
&lt;LI&gt;SET2 is sorted by ID/eventstart&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;then&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data set1;
    input id $ d_start :date9. d_end :date9.
          physcat $ a b c d e;
    format d_start d_end date9.;
    datalines;
s001 01JAN2020 31DEC2020 A 1 2 3 1 0
s001 01JAN2020 31DEC2020 A 1 3 3 1 0
s001 01JAN2021 31DEC2021 B 1 3 3 2 1
s002 01JUN2019 31MAY2020 A 2 1 2 1 1
s002 01JUN2020 31MAY2021 C 2 2 2 2 0
s003 01JAN2020 31DEC2022 B 1 3 4 1 0
run;


data set2;
input id $ eventstart :date9.;
format eventstart date9.;
datalines;
s001 15JUN2020
s001 20JUL2021
s002 15AUG2019
s002 01JUL2020
s003 01JAN2021
s003 01JAN2023
run;

data need /view=need  /*Keep lowest B C D for each ID/D_START*/;
  set set1;
  by id d_start b c d;
  if first.d;
run;

data want (drop=_:);
  set need (keep=id d_start in=in1 rename=(d_start=_ref_date))
      set2 (                in=in2 rename=(eventstart=_ref_date));
  by id _ref_date;

  retain _left_sentinel .;
  if in1 then set need ;
  retain _right_sentinel ' ';

  if in2 ;
  set set2;
  if first.id or eventstart&amp;gt;d_end then call missing(of _left_sentinel--_right_sentinel);
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This will also accommodate multiple events within a date range.&lt;/P&gt;</description>
      <pubDate>Mon, 23 Jun 2025 19:23:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969567#M376876</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2025-06-23T19:23:52Z</dc:date>
    </item>
    <item>
      <title>Re: Merge/join using a date from one dataset within a date range in another</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969612#M376895</link>
      <description>&lt;P&gt;Thanks everyone for all the help! I appreciate it!&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Jun 2025 16:44:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Merge-join-using-a-date-from-one-dataset-within-a-date-range-in/m-p/969612#M376895</guid>
      <dc:creator>sasgorilla</dc:creator>
      <dc:date>2025-06-24T16:44:17Z</dc:date>
    </item>
  </channel>
</rss>

