<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Selecting &amp;quot;NOT&amp;quot; Data (&amp;quot;Not Join&amp;quot;) From a Dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/456137#M284329</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/142314"&gt;@BCNAV&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;If all data resides within SAS then I normally like to use a hash lookup as this avoids the need for any sorting of the large table.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below code is untested but should be o.k.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'B(keep=client_id)')
      h1.defineKey('client_id');
      h1.defineDone();
    end;
  set a;
  if h1.check() ne 0;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
    <pubDate>Sat, 21 Apr 2018 03:26:12 GMT</pubDate>
    <dc:creator>Patrick</dc:creator>
    <dc:date>2018-04-21T03:26:12Z</dc:date>
    <item>
      <title>Selecting "NOT" Data ("Not Join") From a Dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/455689#M284325</link>
      <description>&lt;P&gt;Sorry for the weird title....here goes&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a large dataset (call it A here) with a ton of clients, all who have a unique ID in variable client_id&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have another dataset (call it B here)&amp;nbsp;that has one variable in it called client_id as well&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I want to be able to join A and B so that the result has all client_ids that DO NOT include those in B&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there an easy way to do this? I though of using :&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;data&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; egtask.test;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;set&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; IBS_COMB.DATA;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;where&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; client_id not in(&lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;"N02240"&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;,&lt;/FONT&gt;&lt;FONT color="#800080" face="Courier New" size="3"&gt;"N79761"&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;run&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;But if the list is long it would take a while to enter, and if the list changes this is not efficient. &lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;Thanks&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;-Bill&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Apr 2018 17:17:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/455689#M284325</guid>
      <dc:creator>BCNAV</dc:creator>
      <dc:date>2018-04-19T17:17:26Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting "NOT" Data ("Not Join") From a Dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/455708#M284326</link>
      <description>&lt;P&gt;SQL makes this easier:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sql;&lt;/P&gt;
&lt;P&gt;create table want as select client_id from a&lt;/P&gt;
&lt;P&gt;where client_id not in (select client_id from b);&lt;/P&gt;
&lt;P&gt;quit;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Apr 2018 17:56:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/455708#M284326</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2018-04-19T17:56:35Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting "NOT" Data ("Not Join") From a Dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/455712#M284327</link>
      <description>&lt;P&gt;You're on the right track, except that instead of hard coding the IDs you're better off using them directly from the data set. The merge below makes use of the IN= dataset option which creates a temporary variable (I named it b).&amp;nbsp;&lt;/P&gt;
&lt;P&gt;b=1 if dataset dataB contributes to a merged observation&lt;/P&gt;
&lt;P&gt;b=0 if dataset dataB does not contribute to the merged observation&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;By selecting only the observations where b is not equal to 1, you get all IDs that are not in dataB.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New" size="3" color="#000080"&gt;&lt;STRONG&gt;data&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;SPAN&gt;&amp;nbsp;allAs&lt;/SPAN&gt;;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New" size="3" color="#0000ff"&gt;&amp;nbsp; &amp;nbsp;merge&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;&lt;SPAN&gt;&amp;nbsp;dataA dataB(in=b)&lt;/SPAN&gt;;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;&amp;nbsp; &amp;nbsp;by client_id;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="Courier New" size="3"&gt;&amp;nbsp; &amp;nbsp;if b^=1;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;&lt;FONT face="Courier New" size="3" color="#000080"&gt;run&lt;/FONT&gt;&lt;/STRONG&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Apr 2018 18:04:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/455712#M284327</guid>
      <dc:creator>antonbcristina</dc:creator>
      <dc:date>2018-04-19T18:04:17Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting "NOT" Data ("Not Join") From a Dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/455774#M284328</link>
      <description>&lt;P&gt;Just for reference, a join is usually faster than a nested select clause.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;                                              
data A; do CLIENT_ID= 1 to 1e8; output; end; run;

data B; do CLIENT_ID= 1 to 1e6; output; end; run;
      
proc sql;                    * real time   42.04 seconds;
  create table WANT as 
  select CLIENT_ID 
  from A
  where CLIENT_ID not in (select CLIENT_ID from B);
quit;

proc sql;                    * real time   31.78 seconds;
  create table WANT as 
  select a.CLIENT_ID 
  from A
  left join B
  on a.CLIENT_ID =b.CLIENT_ID
  where b.CLIENT_ID is missing;
quit;

data WANT;                   * real time   14.32 seconds;
  merge A B(in=B);
  by CLIENT_ID;
  if ^B;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 19 Apr 2018 23:45:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/455774#M284328</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-04-19T23:45:57Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting "NOT" Data ("Not Join") From a Dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/456137#M284329</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/142314"&gt;@BCNAV&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;If all data resides within SAS then I normally like to use a hash lookup as this avoids the need for any sorting of the large table.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below code is untested but should be o.k.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'B(keep=client_id)')
      h1.defineKey('client_id');
      h1.defineDone();
    end;
  set a;
  if h1.check() ne 0;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 21 Apr 2018 03:26:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/456137#M284329</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2018-04-21T03:26:12Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting "NOT" Data ("Not Join") From a Dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/456166#M284330</link>
      <description>&lt;P&gt;Not sure that any solution is going to same much time for this problem.&lt;/P&gt;
&lt;P&gt;You have a LARGE datasets and you want to eliminate a small number of records. The result is still going to be a LARGE dataset.&lt;/P&gt;
&lt;P&gt;Just the I/O alone to duplicate your large dataset will take time.&lt;/P&gt;</description>
      <pubDate>Sat, 21 Apr 2018 12:06:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/456166#M284330</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2018-04-21T12:06:22Z</dc:date>
    </item>
    <item>
      <title>Re: Selecting "NOT" Data ("Not Join") From a Dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/456177#M284331</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;If the hash approach is viable then there should be significantly less I/O as no sorting is required.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The "extreme" approach would be to use a modify with a hash lookup and only delete the records logically. That would of course be much faster especially if the number of records to be deleted is only a rather small percentage of the total records in the table.&lt;/P&gt;</description>
      <pubDate>Sat, 21 Apr 2018 14:12:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Selecting-quot-NOT-quot-Data-quot-Not-Join-quot-From-a-Dataset/m-p/456177#M284331</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2018-04-21T14:12:40Z</dc:date>
    </item>
  </channel>
</rss>

