<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Joining based on 2 ranges in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806983#M318049</link>
    <description>&lt;P&gt;Do the ranges in table1 overlap or not, and do the ranges cover all possible values in table2?&lt;/P&gt;</description>
    <pubDate>Sun, 10 Apr 2022 05:46:54 GMT</pubDate>
    <dc:creator>Kurt_Bremser</dc:creator>
    <dc:date>2022-04-10T05:46:54Z</dc:date>
    <item>
      <title>Joining based on 2 ranges</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806982#M318048</link>
      <description>Hi. I have 2 datasets, one is quite large with more than 20mil rows. I am trying to join the two datasets using proc sql. This is my query, but it takes a really long time.&lt;BR /&gt;&lt;BR /&gt;"&lt;BR /&gt;Select [columns]&lt;BR /&gt;&lt;BR /&gt;From Table2&lt;BR /&gt;&lt;BR /&gt;left join Table1&lt;BR /&gt;&lt;BR /&gt;on Table2.Number between Table1.bottom_range and Table1.top_range"&lt;BR /&gt;&lt;BR /&gt;Is there a more efficient solution to doing this?&lt;BR /&gt;&lt;BR /&gt;I have checked and the format of the fields in the join are all numeric.&lt;BR /&gt;&lt;BR /&gt;Table1: has 3 columns , a bottom_range, top_range and a description. For example, the bottom_range might have values 10000, 20000 etc. The top_range might have values 19999, 29999 etc.&lt;BR /&gt;&lt;BR /&gt;Table 2 has multiple columns, one of which, as an example, is called Number, with values such as 17999, 21887 etc. ) this table has 20 million+ observations.</description>
      <pubDate>Sun, 10 Apr 2022 05:41:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806982#M318048</guid>
      <dc:creator>Micha</dc:creator>
      <dc:date>2022-04-10T05:41:30Z</dc:date>
    </item>
    <item>
      <title>Re: Joining based on 2 ranges</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806983#M318049</link>
      <description>&lt;P&gt;Do the ranges in table1 overlap or not, and do the ranges cover all possible values in table2?&lt;/P&gt;</description>
      <pubDate>Sun, 10 Apr 2022 05:46:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806983#M318049</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-04-10T05:46:54Z</dc:date>
    </item>
    <item>
      <title>Re: Joining based on 2 ranges</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806984#M318050</link>
      <description>1. There is no overlap - does it make a different if it overlaps or not?&lt;BR /&gt;2. No, the ranges don't cover all possible values in table2.</description>
      <pubDate>Sun, 10 Apr 2022 06:17:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806984#M318050</guid>
      <dc:creator>Micha</dc:creator>
      <dc:date>2022-04-10T06:17:29Z</dc:date>
    </item>
    <item>
      <title>Re: Joining based on 2 ranges</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806987#M318053</link>
      <description>&lt;P&gt;Given your on condition not using any additional key variable for the join, are you eventually after something like below?&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Select [columns]
From Table2
left join 
(
select min(bottom_range) as min_val, max(top_range) as top_val
from table1
) t1
on Table2.Number between t1.min_val and t1.top_val&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Your current join condition leads likely to a many:many join which could explain the long runtime.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you need to look at the individual ranges then some sequential logic using a SAS datastep is likely more efficient.&lt;/P&gt;
&lt;P&gt;How many rows do you have in table1?&lt;/P&gt;
&lt;P&gt;Do you need a row in the result set for every single matching ranges (=rows from table 2 duplicated) or just select the row if any of the ranges matches?&lt;/P&gt;</description>
      <pubDate>Sun, 10 Apr 2022 06:53:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806987#M318053</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2022-04-10T06:53:55Z</dc:date>
    </item>
    <item>
      <title>Re: Joining based on 2 ranges</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806988#M318054</link>
      <description>&lt;P&gt;No overlaps means that you can't have multiple observations out of one source observation.&lt;/P&gt;
&lt;P&gt;Consider creating a value format with ranges out of table1, and applying that to table2 in a sequential data step.&lt;/P&gt;
&lt;P&gt;Depending on the real range of your numbers and the size of the descriptions, using a temporary array or a hash might also be possible.&lt;/P&gt;
&lt;P&gt;We will need more details about these particulars to make suggestions with code.&lt;/P&gt;</description>
      <pubDate>Sun, 10 Apr 2022 06:51:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806988#M318054</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-04-10T06:51:24Z</dc:date>
    </item>
    <item>
      <title>Re: Joining based on 2 ranges</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806994#M318060</link>
      <description>&lt;P&gt;You can do it with array probing, it requires some "pre-requisites" but executes with no sorting, and with only one data reading for bigger table.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Test data:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data Table1;
input bottom_range top_range description $ ;
cards;
1 4 A
5 9 B
15 19 D
20 24 E
30 34 F
;
run;

data Table2;
  call streaminit(1234);
  do _N_ = 1 to 100;
    do c1 = "X","Y","Z";
      c2 = catx("_", "ABC", _N_);
      number = rand('integer', 0, 30);
      output;
    end;
  end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Pre-requisites:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* sort table with ranges (they do not overlap that's IMPORTANT) */
proc sort data = Table1;
  by bottom_range top_range;
run;

/* create temporary table expanding ranges and collect metadata */
data table1A;
  set table1 end=end;
  do Number = bottom_range to top_range;
    output;
  end;
  if _N_ = 1 then call symputX("nobsB", bottom_range, "G");
  if end then call symputX("nobsT", top_range, "G");
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Execution:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  /* create and populate an arrays for data */
  /* if you need only description delete T and B */
  do until (eof1);
    set Table1A end = eof1;
    array t [&amp;amp;nobsB.:&amp;amp;nobsT.]  _temporary_;
    array b [&amp;amp;nobsB.:&amp;amp;nobsT.]  _temporary_;
    array d [&amp;amp;nobsB.:&amp;amp;nobsT.] $ _temporary_;
    b [number] = bottom_range;
    t [number] = top_range;
    d [number] = description;
  end;

  do until (eof2);
    set Table2 end = eof2;

    /* do the array probbing */
    /* if "hit" then point to exact data cell */
    if &amp;amp;nobsB. &amp;lt;= number &amp;lt;= &amp;amp;nobsT. then
      do;
        bottom_range = b [number];
        top_range    = t [number];
        description  = d [number];
      end;
    else call missing(bottom_range, top_range, description);

    output;
  end;

stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;</description>
      <pubDate>Sun, 10 Apr 2022 07:16:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806994#M318060</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2022-04-10T07:16:49Z</dc:date>
    </item>
    <item>
      <title>Re: Joining based on 2 ranges</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806998#M318063</link>
      <description>&lt;P&gt;I case the ranges would overlap (I know I'm "driving away" form the main topic) similar approach with array probing could be used. it requires some modifications to "pre-requisites" and 2 additional arrays, but does the job without any modification of the "big" data set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Data:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data Table1;
input bottom_range top_range description $ ;
cards;
1 4 A
5 9 B
15 23 D
20 24 E
30 34 F
30 40 G
;
run;

data Table2;
  call streaminit(1234);
  do _N_ = 1 to 10;
    do c1 = "X","Y","Z";
      c2 = catx("_", "ABC", _N_);
      number = rand('integer', 0, 42);
      output;
    end;
  end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Pre-requisites:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* create temporary table expanding ranges */
data table1A;
  set table1 end=end;
  do Number = bottom_range to top_range;
    output;
  end;
run;

/* sort table with ranges */
proc sort data = Table1A;
  by Number bottom_range top_range;
run;

/* ... and collect metadata */
data _null_;
  do point = 1,nobs;
    set Table1A nobs=nobs point=point;

    if point = 1 then call symputX("nobsB", bottom_range, "G");
    if point = nobs then call symputX("nobsT", top_range, "G");
  end;

  call symputX("nobs", nobs, "G");
  stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Execution:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  /* create and populate an arrays for data */
  /* if you need only "description" variable delete T and B */
  do until (eof1);
    set Table1A end = eof1 curobs=curobs;
    by number;

    /* tables for ranges of repetitions, see Table1A */
    array n [&amp;amp;nobsB.:&amp;amp;nobsT.]  _temporary_;
    array o [&amp;amp;nobsB.:&amp;amp;nobsT.]  _temporary_;
    if first.number then n[number] = curobs;
    if last.number then o[number] = curobs;

    /* populate data */
    array t [&amp;amp;nobs.]  _temporary_;
    array b [&amp;amp;nobs.]  _temporary_;
    array d [&amp;amp;nobs.] $ _temporary_;
    b [curobs] = bottom_range;
    t [curobs] = top_range;
    d [curobs] = description;
  end;

  do until (eof2);
    set Table2 end = eof2;

    /* do the array probbing */
    /* if "hit" AND not missing (not existing range) then ... */ 
    if &amp;amp;nobsB. &amp;lt;= number &amp;lt;= &amp;amp;nobsT. and o[number] then
      /* ... point to exact data cells */
      do _N_ = n[number] to o[number];
        bottom_range = b [_N_];
        top_range    = t [_N_];
        description  = d [_N_];
        output;
      end;
    else 
      do;
        /* otherwise keep missing */
        call missing(bottom_range, top_range, description);
        output;
      end;
  end;

stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For 3 millions obs and ranges between 0 and 40, log was:&lt;/P&gt;
&lt;PRE&gt;NOTE: There were 39 observations read from the data set WORK.TABLE1A.
NOTE: There were 3000000 observations read from the data set WORK.TABLE2.
NOTE: The data set WORK.WANT has 3627859 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.81 seconds
      user cpu time       0.34 seconds
      system cpu time     0.46 seconds
      memory              896.06k
      OS Memory           19448.00k
&lt;/PRE&gt;
&lt;P&gt;All the best&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;</description>
      <pubDate>Sun, 10 Apr 2022 08:25:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806998#M318063</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2022-04-10T08:25:58Z</dc:date>
    </item>
    <item>
      <title>Re: Joining based on 2 ranges</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806999#M318064</link>
      <description>&lt;P&gt;Example of &amp;nbsp;creating a format out of your table:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data myfmt;
set table1 (
  end=done
  rename=(
    bottom_range = start
    top_range = end
    description = label
  )
);
;
fmtname = "myrange";
type = "N";
output;
if done
then do;
  start = "other";
  hlo = "O";
  label = "";
  output;
end;
run;

proc format cntlin=myfmt;
run;

data want;
set table2;
description = put(number,myrange.);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Untested; for tested code, provide usable example data in data steps with datalines.&lt;/P&gt;</description>
      <pubDate>Sun, 10 Apr 2022 09:10:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/806999#M318064</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2022-04-10T09:10:13Z</dc:date>
    </item>
    <item>
      <title>Re: Joining based on 2 ranges</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/807016#M318078</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/367782"&gt;@Micha&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;As per your post one of your datasets is over 20 million rows.&lt;BR /&gt;In case this is a table in a RDBMS database, then consider executing the query in the database itself.&lt;BR /&gt;You can pass the smaller table to the database or rather have it as a database table updating the data as needed.&lt;/P&gt;</description>
      <pubDate>Sun, 10 Apr 2022 13:11:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Joining-based-on-2-ranges/m-p/807016#M318078</guid>
      <dc:creator>Sajid01</dc:creator>
      <dc:date>2022-04-10T13:11:48Z</dc:date>
    </item>
  </channel>
</rss>

