<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Nonmatching observations with PROC SQL: works for small DS but not for medium sized DS - issues in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/385794#M92331</link>
    <description>&lt;P&gt;An interesting approach! Is there a way to make keys have different names in the tables? Say GROUP in ds1 and GROUP_1 in ds2?&lt;/P&gt;&lt;P&gt;I tried to follow this approach: &lt;A href="https://communities.sas.com/t5/Base-SAS-Programming/SAS-hash-tables-joining-on-keys-with-different-names/m-p/344206#M79084" target="_blank"&gt;https://communities.sas.com/t5/Base-SAS-Programming/SAS-hash-tables-joining-on-keys-with-different-names/m-p/344206#M79084&lt;/A&gt; but the result was different from the same key one.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
    <pubDate>Sat, 05 Aug 2017 02:12:56 GMT</pubDate>
    <dc:creator>astrae_research</dc:creator>
    <dc:date>2017-08-05T02:12:56Z</dc:date>
    <item>
      <title>Nonmatching observations with PROC SQL: works for small DS but not for medium sized DS - issues</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/384460#M91812</link>
      <description>&lt;P&gt;Dear all,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I found this code online (ht to the OP) to find nonmatching observations between 2 datasets to prepare for a sql join. It works great for the example below, as the result is the nonmatching obs by id and group but for any medium sized dataset of say 1-2 gb on a fast m.2 pcie NVME ssd of approx 600MB/sec speed it just gets stuck reading the datasets over and over. After I stop it after a minute, the log throws the following error:&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;NOTE: There were 112 observations read from the data set WORK.xxxx1.
NOTE: There were 1 observations read from the data set WORK.xxxxxx2.
      WHERE (LPERMNO=22779) and (date_m_crsp=12722);
WARNING: The data set WORK.TEMP_DEBUG1 may be incomplete.  When this step was stopped there were 0 observations and 1899 variables.
WARNING: Data set WORK.TEMP_DEBUG1 was not replaced because this step was stopped.&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Apparently the code is malformed, but I don't know where the mistakes are.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data dataset1;
      input id $ group $ gender $ age;
      cards;
      111 A Male 11
      111 B Male 11
      222 D Male 12
      333 E Female 13
      666 G Female 14
      999 A Male 15
      999 B Male 15
      999 C Male 15
      ;
run;

data dataset2;
      input id $ group $ gender $ age;
      cards;
      111 A Male 11
      999 C Male 15
      ;
run;
proc sql;
 create table tempo_names
 as
 select *
 from dataset1 ds1
 where not exists
 (select ds2.id, ds2.group
 from dataset2 as ds2
 where ds1.id=ds2.id and ds1.group=ds2.group);
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;SAS 9.4 ts1m3 64 bits on EG 7.1 64 bits on a win 7 x64 with 24gb ram at 4.2 ghz 4cores.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you,&lt;/P&gt;&lt;P&gt;astrae&lt;/P&gt;</description>
      <pubDate>Mon, 31 Jul 2017 22:46:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/384460#M91812</guid>
      <dc:creator>astrae_research</dc:creator>
      <dc:date>2017-07-31T22:46:28Z</dc:date>
    </item>
    <item>
      <title>Re: Nonmatching observations with PROC SQL: works for small DS but not for medium sized DS - issues</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/384482#M91815</link>
      <description>&lt;P&gt;Are you looking for exact values or just via ID and Group variables for identifying 'duplicates'?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Aug 2017 01:40:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/384482#M91815</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-08-01T01:40:23Z</dc:date>
    </item>
    <item>
      <title>Re: Nonmatching observations with PROC SQL: works for small DS but not for medium sized DS - issues</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/384483#M91816</link>
      <description>&lt;P&gt;I'd probably append the two datasets and sort by ID and use PROC SORT to identify unique and not unique records.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Aug 2017 01:41:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/384483#M91816</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-08-01T01:41:11Z</dc:date>
    </item>
    <item>
      <title>Re: Nonmatching observations with PROC SQL: works for small DS but not for medium sized DS - issues</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/384484#M91817</link>
      <description>&lt;P&gt;I don't think there is anything wrong with your syntax, but it might be trying to accomplish too much IO.&lt;/P&gt;
&lt;P&gt;Doesn't something like the following do the same thing?&lt;/P&gt;
&lt;PRE&gt;proc sql;
 create table tempo_names2 as
   select ds1.*
     from dataset1 ds1
       left outer join
         dataset2 ds2
           on ds1.id=ds2.id and ds1.group=ds2.group
             where ds1.id&amp;lt;&amp;gt;ds2.id and ds1.group&amp;lt;&amp;gt;ds2.group
           
 ;
quit;
&lt;/PRE&gt;
&lt;P&gt;Art, CEO, AnalystFinder.com&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Aug 2017 01:42:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/384484#M91817</guid>
      <dc:creator>art297</dc:creator>
      <dc:date>2017-08-01T01:42:41Z</dc:date>
    </item>
    <item>
      <title>Re: Nonmatching observations with PROC SQL: works for small DS but not for medium sized DS - issues</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/384486#M91819</link>
      <description>&lt;P&gt;Exist clauses can be seriously&amp;nbsp;inefficient.&lt;/P&gt;
&lt;P&gt;Try:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
proc sql;
 create table TEMPO_NAMES as
 select *
 from DATASET1 
 except
 select *
 from DATASET2;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Aug 2017 01:47:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/384486#M91819</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2017-08-01T01:47:21Z</dc:date>
    </item>
    <item>
      <title>Re: Nonmatching observations with PROC SQL: works for small DS but not for medium sized DS - issues</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/384487#M91820</link>
      <description>&lt;P&gt;If you only want to use 2 variables for deduplication, hash tables would be the fastest.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data DATASET1
     DATASET2;
  input ID $ GROUP $ GENDER $ AGE;
  do I=1 to 1e4;
    output DATASET1; 
    if _N_ in (1,8) then output DATASET2; 
  end;
  drop I;
cards;
111 A Male 11
111 B Male 11
222 D Male 12
333 E Female 13
666 G Female 14
999 A Male 15
999 B Male 15
999 C Male 15
run;

       
data TEMPO_NAMES1; ************ 0.1 seconds **********;
  set DATASET1;
  if _N_=1 then do;
    dcl hash LOOKUP(dataset:'DATASET2');
    LOOKUP.definekey('ID','GROUP');
    LOOKUP.definedone();
  end;
  if LOOKUP.check();
run;

proc sql;        ************ 22 seconds **********;
 create table TEMPO_NAMES2 as
 select ds1.*
 from DATASET1 ds1
       left join
      DATASET2 ds2
       on ds1.ID=ds2.ID and ds1.GROUP=ds2.GROUP
  where ds1.ID ne ds2.ID and ds1.GROUP ne ds2.GROUP ;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 01 Aug 2017 02:04:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/384487#M91820</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2017-08-01T02:04:57Z</dc:date>
    </item>
    <item>
      <title>Re: Nonmatching observations with PROC SQL: works for small DS but not for medium sized DS - issues</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/385138#M92094</link>
      <description>&lt;P&gt;Thank you guys! I haven't had time to go over all of the solutions yet, but this all looks very promising. I had a feeling that one of the datasets was being cycled for every observation in the other dataset based on the I/O stats, maybe that's due to the exist clause?&lt;/P&gt;</description>
      <pubDate>Wed, 02 Aug 2017 22:13:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/385138#M92094</guid>
      <dc:creator>astrae_research</dc:creator>
      <dc:date>2017-08-02T22:13:27Z</dc:date>
    </item>
    <item>
      <title>Re: Nonmatching observations with PROC SQL: works for small DS but not for medium sized DS - issues</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/385794#M92331</link>
      <description>&lt;P&gt;An interesting approach! Is there a way to make keys have different names in the tables? Say GROUP in ds1 and GROUP_1 in ds2?&lt;/P&gt;&lt;P&gt;I tried to follow this approach: &lt;A href="https://communities.sas.com/t5/Base-SAS-Programming/SAS-hash-tables-joining-on-keys-with-different-names/m-p/344206#M79084" target="_blank"&gt;https://communities.sas.com/t5/Base-SAS-Programming/SAS-hash-tables-joining-on-keys-with-different-names/m-p/344206#M79084&lt;/A&gt; but the result was different from the same key one.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you&lt;/P&gt;</description>
      <pubDate>Sat, 05 Aug 2017 02:12:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/385794#M92331</guid>
      <dc:creator>astrae_research</dc:creator>
      <dc:date>2017-08-05T02:12:56Z</dc:date>
    </item>
    <item>
      <title>Re: Nonmatching observations with PROC SQL: works for small DS but not for medium sized DS - issues</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/385796#M92332</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/120002"&gt;@astrae_research&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;Yes, you can have different variable names and then lookup values in the&amp;nbsp;hash using syntax like:&lt;/P&gt;
&lt;P&gt;lookup.check(key:ID, key:&lt;SPAN&gt;GROUP)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;You still would need to define variable GROUP_1 in ds1.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;What I would do instead is to rename the variables when you load them into the hash:&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE class=" language-sas"&gt;&lt;CODE class="  language-sas"&gt;dcl hash LOOKUP&lt;SPAN class="token punctuation"&gt;(&lt;/SPAN&gt;dataset:&lt;SPAN class="token string"&gt;'DATASET2(rename=(group_1=group))'&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;)&lt;/SPAN&gt;&lt;SPAN class="token punctuation"&gt;;&lt;/SPAN&gt;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 05 Aug 2017 02:39:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/385796#M92332</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2017-08-05T02:39:25Z</dc:date>
    </item>
    <item>
      <title>Re: Nonmatching observations with PROC SQL: works for small DS but not for medium sized DS - issues</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/386158#M92455</link>
      <description>&lt;P&gt;I wrote a small macro using the SQL approach using the code from Chris's reply. Decided to post it here. I find the sql performance hit is not that bad and it also allows to calculate some basic counts in the same proc ,e.g., how many groups have non-matching observations. Is there a difference between left join and left outer join in this case?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;/* Macro to find nonmatching obsevations between
two datasets on two keys/vars */

%MACRO nonm_2var_sql(dsn_in1=,dsn_in2=,dsn_an=,group_var=,var1in1=, var1in2=, var2in1=, var2in2=,var3in1=, var3in2=);
%macro _; %mend _;
proc sql;       
 create table &amp;amp;dsn_an. as
 select *, count(distinct(&amp;amp;group_var.)) as group_cnt from (select a.*, count(a.&amp;amp;group_var.) as obs_per_group_cnt
 from &amp;amp;dsn_in1. as a
       left join
      &amp;amp;dsn_in2. as b
       on a.&amp;amp;var1in1.=b.&amp;amp;var1in2. and a.&amp;amp;var2in1.=b.&amp;amp;var2in2. 
  where a.&amp;amp;var1in1. ne b.&amp;amp;var1in2. and a.&amp;amp;var2in1. ne b.&amp;amp;var2in2. 
  group by a.&amp;amp;group_var.);
quit;
%MEND nonm_2var_sql;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Thank you all who replied to my question!&lt;/P&gt;</description>
      <pubDate>Mon, 07 Aug 2017 21:50:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/386158#M92455</guid>
      <dc:creator>astrae_research</dc:creator>
      <dc:date>2017-08-07T21:50:04Z</dc:date>
    </item>
    <item>
      <title>Re: Nonmatching observations with PROC SQL: works for small DS but not for medium sized DS - issues</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/386162#M92456</link>
      <description>&lt;P&gt;The keyword OUTER is optional. They are the same.&lt;/P&gt;</description>
      <pubDate>Mon, 07 Aug 2017 22:58:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Nonmatching-observations-with-PROC-SQL-works-for-small-DS-but/m-p/386162#M92456</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2017-08-07T22:58:00Z</dc:date>
    </item>
  </channel>
</rss>

