<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Subtracting two SAS data sets in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237488#M55362</link>
    <description>&lt;P&gt;Just to explain the difference between "&lt;FONT face="courier new,courier"&gt;except&lt;/FONT&gt;" and "&lt;FONT face="courier new,courier"&gt;except all&lt;/FONT&gt;":&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data A;
length X $16;
input X $ Y;
cards;
Apples 1
Oranges 3
Oranges 3
Pears 3
Pears 3
;
 
data B;
length X $16;
input X $ Y;
cards;
Apples 1
Watermellon 1
Watermellon 1
Oranges 3
Oranges 3
Oranges 3
Oranges 3
Oranges 3
Pears 3
Banana 2
;

proc sql;
create table diff_all as
select * from b
except all
select * from a;

create table diff_wo_all as
select * from b
except
select * from a;
quit;

proc print data=diff_all;
run;

proc print data=diff_wo_all;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Using the EXCEPT operator without the ALL option, the multiplicities of (possible) duplicate observations are disregarded. In the above example (with duplicates in A and B) this means that DIFF_WO_ALL will be the same (up to sort order) as DIFF in my first reply. Dataset DIFF_ALL, however, contains&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;1 - 1 = 0 "Apples 1" observations&lt;/LI&gt;
&lt;LI&gt;5&amp;nbsp;- 2&amp;nbsp;= 3 "Oranges 3" observations&lt;/LI&gt;
&lt;LI&gt;2 - 0 = 2 "Watermellon 1" observations&lt;/LI&gt;
&lt;LI&gt;max(0, 1-2)=0 "Pears 3" observations (not a negative number, of course)&lt;/LI&gt;
&lt;LI&gt;1 - 0 = 1 "Banana 2" observation&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;as shown below.&lt;/P&gt;
&lt;PRE&gt;Obs    X              Y

 1     Banana         2
 2     Oranges        3
 3     Oranges        3
 4     Oranges        3
 5     Watermellon    1
 6     Watermellon    1
&lt;/PRE&gt;
&lt;P&gt;So, if your datasets may&amp;nbsp;contain duplicate records (not necessarily as consecutive records), you have to decide if you want to take the multiplicities into account as in the above example. Either way, as you can see in the example, in the presence of duplicates the number of observations in the resulting dataset is possibly not equal to the difference "no. of observations in B minus&amp;nbsp;&lt;SPAN&gt;no.&lt;/SPAN&gt; &lt;SPAN&gt;of observations in A": 10 - 5 equals neither 2 (DIFF_WO_ALL) nor 6 (DIFF_ALL).&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 02 Dec 2015 23:12:36 GMT</pubDate>
    <dc:creator>FreelanceReinh</dc:creator>
    <dc:date>2015-12-02T23:12:36Z</dc:date>
    <item>
      <title>Subtracting two SAS data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237476#M55359</link>
      <description>&lt;P&gt;I have two SAS data sets - one of them has about 600,000 more observations than the other. All of the observations from SAS data set A are also in SAS data Set B, B just has more observations in addition. How can I abstract only those ~600,000 observations that are different? I already tried PROC COMPARE with no luck, since the new data aren't in any particualr order. for example&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Set A&lt;/P&gt;&lt;P&gt;X Y&lt;/P&gt;&lt;P&gt;Apples 1&lt;/P&gt;&lt;P&gt;Oranges 3&lt;/P&gt;&lt;P&gt;Pears 3&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Set B&lt;/P&gt;&lt;P&gt;X Y&lt;/P&gt;&lt;P&gt;Apples 1&lt;/P&gt;&lt;P&gt;Watermellon 1&lt;/P&gt;&lt;P&gt;Oranges 3&lt;/P&gt;&lt;P&gt;Pears 3&lt;/P&gt;&lt;P&gt;Banana 2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;in this case I want my output to be&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;X Y&lt;/P&gt;&lt;P&gt;Watermellon 1&lt;/P&gt;&lt;P&gt;Banana 2&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thoughts?&lt;/P&gt;</description>
      <pubDate>Wed, 02 Dec 2015 21:52:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237476#M55359</guid>
      <dc:creator>ErinRoberts</dc:creator>
      <dc:date>2015-12-02T21:52:16Z</dc:date>
    </item>
    <item>
      <title>Re: Subtracting two SAS data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237480#M55360</link>
      <description>&lt;P&gt;If you don't insist on preserving the order of observations in the output dataset, it can be as simple as this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data A;
length X $16;
input X $ Y;
cards;
Apples 1
Oranges 3
Pears 3
;
 
data B;
length X $16;
input X $ Y;
cards;
Apples 1
Watermellon 1
Oranges 3
Pears 3
Banana 2
;

proc sql;
create table diff as
select * from b
except all
select * from a;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Dec 2015 22:19:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237480#M55360</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2015-12-02T22:19:35Z</dc:date>
    </item>
    <item>
      <title>Re: Subtracting two SAS data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237481#M55361</link>
      <description>&lt;P&gt;proc sql;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; create table unique as&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; select * from largedataset&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; except&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp; select * from smallerdataset;&lt;/P&gt;
&lt;P&gt;quit;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This will require all of the variables to be in both data sets and may take a long time if there are many variables.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Dec 2015 22:21:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237481#M55361</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2015-12-02T22:21:52Z</dc:date>
    </item>
    <item>
      <title>Re: Subtracting two SAS data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237488#M55362</link>
      <description>&lt;P&gt;Just to explain the difference between "&lt;FONT face="courier new,courier"&gt;except&lt;/FONT&gt;" and "&lt;FONT face="courier new,courier"&gt;except all&lt;/FONT&gt;":&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data A;
length X $16;
input X $ Y;
cards;
Apples 1
Oranges 3
Oranges 3
Pears 3
Pears 3
;
 
data B;
length X $16;
input X $ Y;
cards;
Apples 1
Watermellon 1
Watermellon 1
Oranges 3
Oranges 3
Oranges 3
Oranges 3
Oranges 3
Pears 3
Banana 2
;

proc sql;
create table diff_all as
select * from b
except all
select * from a;

create table diff_wo_all as
select * from b
except
select * from a;
quit;

proc print data=diff_all;
run;

proc print data=diff_wo_all;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Using the EXCEPT operator without the ALL option, the multiplicities of (possible) duplicate observations are disregarded. In the above example (with duplicates in A and B) this means that DIFF_WO_ALL will be the same (up to sort order) as DIFF in my first reply. Dataset DIFF_ALL, however, contains&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;1 - 1 = 0 "Apples 1" observations&lt;/LI&gt;
&lt;LI&gt;5&amp;nbsp;- 2&amp;nbsp;= 3 "Oranges 3" observations&lt;/LI&gt;
&lt;LI&gt;2 - 0 = 2 "Watermellon 1" observations&lt;/LI&gt;
&lt;LI&gt;max(0, 1-2)=0 "Pears 3" observations (not a negative number, of course)&lt;/LI&gt;
&lt;LI&gt;1 - 0 = 1 "Banana 2" observation&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;as shown below.&lt;/P&gt;
&lt;PRE&gt;Obs    X              Y

 1     Banana         2
 2     Oranges        3
 3     Oranges        3
 4     Oranges        3
 5     Watermellon    1
 6     Watermellon    1
&lt;/PRE&gt;
&lt;P&gt;So, if your datasets may&amp;nbsp;contain duplicate records (not necessarily as consecutive records), you have to decide if you want to take the multiplicities into account as in the above example. Either way, as you can see in the example, in the presence of duplicates the number of observations in the resulting dataset is possibly not equal to the difference "no. of observations in B minus&amp;nbsp;&lt;SPAN&gt;no.&lt;/SPAN&gt; &lt;SPAN&gt;of observations in A": 10 - 5 equals neither 2 (DIFF_WO_ALL) nor 6 (DIFF_ALL).&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 02 Dec 2015 23:12:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237488#M55362</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2015-12-02T23:12:36Z</dc:date>
    </item>
    <item>
      <title>Re: Subtracting two SAS data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237581#M55369</link>
      <description>&lt;P&gt;Datastep gives you more options, like if a and b, if b and not a, if a and not b but requires a sort first.&amp;nbsp; Here is an example with datastep merge and proc sql subquery:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data a;&lt;BR /&gt;input x$ y;&lt;BR /&gt;cards;&lt;BR /&gt;apples 1&lt;BR /&gt;oranges 2&lt;BR /&gt;pears 3&lt;BR /&gt;;&lt;BR /&gt;&lt;BR /&gt;data b;&lt;BR /&gt;input x$ y;&lt;BR /&gt;cards;&lt;BR /&gt;apples 1&lt;BR /&gt;watermelon 1&lt;BR /&gt;oranges 3&lt;BR /&gt;pears 3&lt;BR /&gt;banana 2&lt;BR /&gt;;&lt;BR /&gt;&lt;BR /&gt;proc sort data=a;by x y;&lt;BR /&gt;proc sort data=b;by x y;&lt;BR /&gt;&lt;BR /&gt;data want;&lt;BR /&gt;merge a (in=a)&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; b (in=b);&lt;BR /&gt;by x;&lt;BR /&gt;if b and not a;&lt;BR /&gt;run;&lt;BR /&gt;&lt;BR /&gt;proc sql;&lt;BR /&gt;create table want_sql as&lt;BR /&gt;select x,y&lt;BR /&gt;from b&lt;BR /&gt;where x not in (select x from a);&lt;/P&gt;</description>
      <pubDate>Thu, 03 Dec 2015 13:50:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237581#M55369</guid>
      <dc:creator>Steelers_In_DC</dc:creator>
      <dc:date>2015-12-03T13:50:32Z</dc:date>
    </item>
    <item>
      <title>Re: Subtracting two SAS data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237589#M55370</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/30712"&gt;@Steelers_In_DC﻿&lt;/a&gt;: It should be noted that both of your suggestions are not equivalent to the other solutions presented: The task was to select those &lt;EM&gt;observations&lt;/EM&gt; from B which are not contained in A. This could&amp;nbsp;very well&amp;nbsp;require to distinguish between, say, (X="Banana", Y=&lt;STRONG&gt;1&lt;/STRONG&gt;) and &lt;SPAN&gt;(X="Banana", Y=&lt;STRONG&gt;2&lt;/STRONG&gt;). Your suggestions, however, assume that X alone identifies an observation, so that&amp;nbsp;(X="Banana", Y=&lt;STRONG&gt;2&lt;/STRONG&gt;&lt;SPAN&gt;) would not be selected from B if&amp;nbsp;(X="Banana", Y=&lt;STRONG&gt;1&lt;/STRONG&gt;&lt;SPAN&gt;) was contained in A.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;Duplicates would be handled differently as well. In case of duplicate values of X in both datasets, let alone&amp;nbsp;duplicate&amp;nbsp;&lt;EM&gt;observations&lt;/EM&gt;, the data step would trigger the log message&amp;nbsp;"NOTE: MERGE statement has more than one data set with repeats of BY values.", which would make the alarm bells ring.&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;With datasets A and B having unique X values and Y values&amp;nbsp;which are determined by X (like &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/25436"&gt;@ErinRoberts﻿&lt;/a&gt;'s&amp;nbsp;original sample data) all four solutions presented would lead to the same set of observations (not identically sorted, though).&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 03 Dec 2015 14:48:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237589#M55370</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2015-12-03T14:48:54Z</dc:date>
    </item>
    <item>
      <title>Re: Subtracting two SAS data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237656#M55374</link>
      <description>&lt;P&gt;Here's one more way of doing it using a subquery with the SQL procedure.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data dataA;
length X $20;
input X$ Y;
datalines;
Apples 1
Oranges 3
Pears 3
;
run;

data dataB;
length X $20;
input X$ Y;
datalines;
Apples 1
Watermellon 1
Oranges 3
Pears 3
Banana 2
;
run;

proc sql;
create table want as
select *
from dataB
Where X not in (select X from dataA);
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 03 Dec 2015 18:19:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237656#M55374</guid>
      <dc:creator>dcruik</dc:creator>
      <dc:date>2015-12-03T18:19:50Z</dc:date>
    </item>
    <item>
      <title>Re: Subtracting two SAS data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237761#M55377</link>
      <description>&lt;P&gt;Dear,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can try many ways to get it done...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/40360"&gt;@dcruik﻿&lt;/a&gt;&amp;nbsp;Proc sql is good to use and we can use datastep as well for the same...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;try below code ....&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
data  A;
input X $ Y;
cards;
Apples  1
Oranges 3
Pears   3
;
run;
data  b;
input X $ Y;
cards;
Apples      1
Watermellon 1
Oranges     3
Pears       3
Banana      2
;
run;
/*method 1*/
proc sql;
create table want as select * from b  where x not in (select distinct x from  a);
quit;

/*Method two */

Proc sort data=a; by x;run;
Proc sort data=b; by x;run;

data want;
merge a(in=_x) b(in=_y);
by x;
if _y and not _x;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 04 Dec 2015 09:08:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Subtracting-two-SAS-data-sets/m-p/237761#M55377</guid>
      <dc:creator>yaswanthj</dc:creator>
      <dc:date>2015-12-04T09:08:03Z</dc:date>
    </item>
  </channel>
</rss>

