<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Loop through large set multiple times based on matching values from a smaller dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541970#M149742</link>
    <description>Thanks for all of the input.  Going to give all of these a try and see what works best.&lt;BR /&gt;And fyi..  overlaps are intended to be included.&lt;BR /&gt;</description>
    <pubDate>Mon, 11 Mar 2019 11:55:22 GMT</pubDate>
    <dc:creator>redfishJAX</dc:creator>
    <dc:date>2019-03-11T11:55:22Z</dc:date>
    <item>
      <title>Loop through large set multiple times based on matching values from a smaller dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541827#M149677</link>
      <description>&lt;P&gt;What is the best way to do this?&lt;/P&gt;
&lt;P&gt;The actual datasets are millions of obs..&amp;nbsp; one small (A) against one huge (~10m) B) but i need A to loop through B and find matches based on date conditions.&lt;BR /&gt;once each single observation on A has looped through B, i need it to do the same for the nexst observation in A and so&amp;nbsp; on&lt;/P&gt;
&lt;P&gt;A&lt;BR /&gt;month&amp;nbsp;begin&amp;nbsp;&amp;nbsp; end&lt;BR /&gt;Jan19&amp;nbsp;&amp;nbsp; 1/1/2019&amp;nbsp; 1/31/2019&lt;BR /&gt;Feb19&amp;nbsp;&amp;nbsp; 2/1/2019&amp;nbsp; 2/28/2019&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;B&lt;BR /&gt;custid&amp;nbsp;recur_start&amp;nbsp;recur_end&amp;nbsp;recur_amt&lt;BR /&gt;1&amp;nbsp;10/15/2014&amp;nbsp;2/16/2019&amp;nbsp;150&lt;BR /&gt;2&amp;nbsp;2/18/2018&amp;nbsp;1/31/2019&amp;nbsp;150&lt;BR /&gt;3&amp;nbsp;12/15/2012&amp;nbsp;3/31/2021&amp;nbsp;100&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;result dataset needed:&lt;BR /&gt;month&amp;nbsp;custid&amp;nbsp;recur_start&amp;nbsp;recur_end&amp;nbsp;recur_amt&lt;BR /&gt;Jan19&amp;nbsp;1&amp;nbsp;10/15/2014&amp;nbsp;2/16/2019&amp;nbsp;150&lt;BR /&gt;Jan19&amp;nbsp;2&amp;nbsp;2/18/2018&amp;nbsp;1/31/2019&amp;nbsp;150&lt;BR /&gt;Jan19&amp;nbsp;3&amp;nbsp;12/15/2012&amp;nbsp;3/31/2021&amp;nbsp;100&lt;BR /&gt;Feb19&amp;nbsp;1&amp;nbsp;10/15/2014&amp;nbsp;2/16/2019&amp;nbsp;150&amp;nbsp;&lt;BR /&gt;Feb19&amp;nbsp;3&amp;nbsp;12/15/2012&amp;nbsp;3/31/2021&amp;nbsp;100&lt;/P&gt;
&lt;P&gt;&lt;LI-WRAPPER&gt;&lt;/LI-WRAPPER&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 10 Mar 2019 15:32:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541827#M149677</guid>
      <dc:creator>redfishJAX</dc:creator>
      <dc:date>2019-03-10T15:32:36Z</dc:date>
    </item>
    <item>
      <title>Re: Loop through large set multiple times based on matching values from a smaller dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541828#M149678</link>
      <description>&lt;P&gt;you request is not completely clear and allows for a programmer to build results that are not what you want because you forgot the full conditional statement for updating the results. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;second why not flip you process and run through the small dataset to find what records match in A and use an update process.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 10 Mar 2019 15:42:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541828#M149678</guid>
      <dc:creator>VDD</dc:creator>
      <dc:date>2019-03-10T15:42:55Z</dc:date>
    </item>
    <item>
      <title>Re: Loop through large set multiple times based on matching values from a smaller dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541831#M149679</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/10584"&gt;@redfishJAX&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;One way is an SQL join. The might be more effecient solutions, but I think this would run pretty fast too, because the A data set is small, and it is simple.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data A; 
	informat begin end mmddyy10.;
	format begin end mmddyy10.;
	input month $ begin end;
datalines;
Jan19 1/1/2019 1/31/2019
Feb19 2/1/2019 2/28/2019
;
run;

data B; 
	informat recur_start recur_end mmddyy10.;
	format recur_start recur_end mmddyy10.;
	input custid recur_start recur_end recur_amt;
datalines;
1 10/15/2014 2/16/2019 150
2 2/18/2018 1/31/2019 150
3 12/15/2012 3/31/2021 100 
;
run;

proc sql;
	create table C as 
		select a.month, b.custid, b.recur_start, b.recur_end, b.recur_amt
		from A, B
		where 
			a.begin &amp;lt;= b.recur_end
			and a.end &amp;gt;= b.recur_start;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="join.gif" style="width: 413px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/27812i8BE182DCFC571DD4/image-size/large?v=v2&amp;amp;px=999" role="button" title="join.gif" alt="join.gif" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 10 Mar 2019 16:02:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541831#M149679</guid>
      <dc:creator>ErikLund_Jensen</dc:creator>
      <dc:date>2019-03-10T16:02:25Z</dc:date>
    </item>
    <item>
      <title>Re: Loop through large set multiple times based on matching values from a smaller dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541860#M149696</link>
      <description>&lt;P&gt;I'd go in real live likely for SQL but below just for fun a data step approach.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data A; 
	informat begin end mmddyy10.;
	format begin end mmddyy10.;
	input month $ begin end;
datalines;
Jan19 1/1/2019 1/31/2019
Feb19 2/1/2019 2/28/2019
;
run;

data B; 
	informat recur_start recur_end mmddyy10.;
	format recur_start recur_end mmddyy10.;
	input custid recur_start recur_end recur_amt;
datalines;
1 10/15/2014 2/16/2019 150
2 2/18/2018 1/31/2019 150
3 12/15/2012 3/31/2021 100 
;
run;

data want(keep=month begin end recur_amt);
  set b;
  _last=0;
  do _i=1 to _nobs;
    set a point=_i nobs=_nobs;
    if begin&amp;lt;=recur_end and end&amp;gt;=recur_start then output want;
  end;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sun, 10 Mar 2019 19:54:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541860#M149696</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2019-03-10T19:54:24Z</dc:date>
    </item>
    <item>
      <title>Re: Loop through large set multiple times based on matching values from a smaller dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541865#M149698</link>
      <description>&lt;P&gt;You have cleverly shown the easy cases, and omitted the difficult cases.&amp;nbsp; What should be done with a partial overlap ... keep it or remove it?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;A&lt;BR /&gt;month&amp;nbsp;begin&amp;nbsp;&amp;nbsp; end&lt;BR /&gt;Jan19&amp;nbsp;&amp;nbsp; 1/1/2019&amp;nbsp; 3/31/2019&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;B&lt;BR /&gt;custid&amp;nbsp;recur_start&amp;nbsp;recur_end&amp;nbsp;recur_amt&lt;BR /&gt;1&amp;nbsp;10/15/2014&amp;nbsp;2/16/2019&amp;nbsp;150&lt;BR /&gt;2&amp;nbsp;2/18/2018&amp;nbsp;1/31/2019&amp;nbsp;150&lt;BR /&gt;3&amp;nbsp;12/15/2012&amp;nbsp;3/31/2021&amp;nbsp;100&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 10 Mar 2019 20:52:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541865#M149698</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2019-03-10T20:52:57Z</dc:date>
    </item>
    <item>
      <title>Re: Loop through large set multiple times based on matching values from a smaller dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541867#M149699</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12447"&gt;@Patrick&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Very elegant indeed - I didn't think of that. I had a hash lookup in mind, but (I am ashamed to admit) most of my toolbox is pre-V9, and after 15 years with V9 I am still not really familiar with hash objects, so it would take me too long to figure out.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But I could not resist running a test to compare the two solutions. I used 24 months in A and 5.000.000 observations in B with random intervals. SQL is the winner:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;236  data A (drop=i s);
237      format begin end mmddyy10.;
238      s = '01dec2017'd;
239      do i = 1 to 24;
240          begin = intnx('month',s,i);
241          end = intnx('month',begin,1)-1;
242          month = put(begin,monyy.);
243          output;
244      end;
245  run;

NOTE: The data set WORK.A has 24 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.00 seconds


246
247  data B;
248      format recur_start recur_end yymmdd10.;
249      do custid = 1 to 5000000;
250          recur_start = (ranuni(1) * 730) + 21000;
251          recur_end = min(recur_start + (ranuni(3) * 730),22100);
252          recur_amt = int(ranuni(5)*1000);
253          output;
254      end;
255  run;

NOTE: The data set WORK.B has 5000000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           0.45 seconds
      cpu time            0.45 seconds


256
257  data want(keep=month begin end recur_amt);
258    set b;
259    _last=0;
260    do _i=1 to _nobs;
261      set a point=_i nobs=_nobs;
262      if begin&amp;lt;=recur_end and end&amp;gt;=recur_start then output want;
263    end;
264  run;

NOTE: There were 5000000 observations read from the data set WORK.B.
NOTE: The data set WORK.WANT has 51607422 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           17.74 seconds
      cpu time            16.90 seconds


265
266  proc sql;
267      create table want as
268          select a.month, b.custid, b.recur_start, b.recur_end, b.recur_amt
269          from A, B
270          where
271              a.begin &amp;lt;= b.recur_end
272              and a.end &amp;gt;= b.recur_start;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can
      not be optimized.
NOTE: Table WORK.WANT created, with 51607422 rows and 5 columns.

273  quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           9.92 seconds
      cpu time            8.40 seconds
&lt;/PRE&gt;
&lt;P&gt;I ran this on a Lenovo PC with SSD disk. I also tried it on a Linux Grid running a heavy batch load at the moment, and got almost the same figures:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;21
22    data want(keep=month begin end recur_amt);
23      set b;
24      _last=0;
25      do _i=1 to _nobs;
26        set a point=_i nobs=_nobs;
27        if begin&amp;lt;=recur_end and end&amp;gt;=recur_start then output want;
28      end;
29    run;

NOTE: There were 5000000 observations read from the data set WORK.B.
NOTE: The data set WORK.WANT has 51607422 observations and 4 variables.
NOTE: DATA statement used (Total process time):
      real time           14.18 seconds
      cpu time            13.16 seconds


30
31    proc sql;
32        create table want as
33            select a.month, b.custid, b.recur_start, b.recur_end, b.recur_amt
34            from A, B
35            where
36                a.begin &amp;lt;= b.recur_end
37                and a.end &amp;gt;= b.recur_start;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can
      not be optimized.
NOTE: Table WORK.WANT created, with 51607422 rows and 5 columns.

38    quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           8.39 seconds
      cpu time            8.39 seconds
&lt;/PRE&gt;</description>
      <pubDate>Sun, 10 Mar 2019 21:15:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541867#M149699</guid>
      <dc:creator>ErikLund_Jensen</dc:creator>
      <dc:date>2019-03-10T21:15:33Z</dc:date>
    </item>
    <item>
      <title>Re: Loop through large set multiple times based on matching values from a smaller dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541870#M149700</link>
      <description>A relevant question because we don't know the actual problem. But I tried to imagine a situation where a partial overlap should be removed, like in "make a list of employees employed in the full months of april, may and june".&lt;BR /&gt;&lt;BR /&gt;I often do interval joins, but the requirement has always been either a boolean "overlap or not", a real interval join where the overlapping interval is the wanted output, or overlap on a given day like end-of-month. It was never "Full overlap only", so I am just curious - is it something people often use "out there"?</description>
      <pubDate>Sun, 10 Mar 2019 22:11:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541870#M149700</guid>
      <dc:creator>ErikLund_Jensen</dc:creator>
      <dc:date>2019-03-10T22:11:08Z</dc:date>
    </item>
    <item>
      <title>Re: Loop through large set multiple times based on matching values from a smaller dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541970#M149742</link>
      <description>Thanks for all of the input.  Going to give all of these a try and see what works best.&lt;BR /&gt;And fyi..  overlaps are intended to be included.&lt;BR /&gt;</description>
      <pubDate>Mon, 11 Mar 2019 11:55:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Loop-through-large-set-multiple-times-based-on-matching-values/m-p/541970#M149742</guid>
      <dc:creator>redfishJAX</dc:creator>
      <dc:date>2019-03-11T11:55:22Z</dc:date>
    </item>
  </channel>
</rss>

