<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Finding observation difference in two separate datasets in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960706#M374645</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;I know the total excluded is 5,000 from simple math but I'm a bit stuck on how we can do this in a&amp;nbsp; SAS code. &lt;BR /&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;You don't know that. Maybe there are duplicates, or unique observations in both data sets.&lt;/P&gt;
&lt;P&gt;Never make assumptions about your data quality, and always check for all possible defects.&lt;/P&gt;
&lt;P&gt;Assuming you won't rely on the observation order, here is one way:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data A;
  X=1; output;
  X=1; output;  
  X=2; output;
  X=2; output;
run;

data B;
  X=2; output;
  X=3; output;
run;

proc sql;
  create table WANT as 
  select 'In A only' as SRC, * 
  from (select * from A
        except all
        select * from B )
  union all
  select 'In B only' as SRC, * 
  from (select * from B
        except all
        select * from A );
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This results in:&lt;/P&gt;
&lt;TABLE class="table" style="border-spacing: 0;" aria-label="Query Results"&gt;
&lt;THEAD&gt;
&lt;TR&gt;
&lt;TH class="b header" scope="col"&gt;SRC&lt;/TH&gt;
&lt;TH class="r b header" scope="col"&gt;X&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="data"&gt;In A only&lt;/TD&gt;
&lt;TD class="r data"&gt;1&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="data"&gt;In A only&lt;/TD&gt;
&lt;TD class="r data"&gt;1&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="data"&gt;In A only&lt;/TD&gt;
&lt;TD class="r data"&gt;2 *&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="data"&gt;In B only&lt;/TD&gt;
&lt;TD class="r data"&gt;3&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;* The&amp;nbsp;&lt;CODE class=" language-sas"&gt;all&lt;/CODE&gt; keyword in &lt;CODE class=" language-sas"&gt;except&lt;/CODE&gt; does not suppress duplicate rows, hence why the second X=2 observation is flagged by the code above.&lt;/P&gt;
&lt;P&gt;Same thing for keeping both X=1 observations, thanks to &lt;CODE class=" language-sas"&gt;union all&lt;/CODE&gt;.&lt;/P&gt;
&lt;P&gt;See &lt;A href="https://documentation.sas.com/doc/en/sqlproc/3.2/n0vo2lglyrnexwn14emi8m0jqvrj.htm" target="_blank"&gt;https://documentation.sas.com/doc/en/sqlproc/3.2/n0vo2lglyrnexwn14emi8m0jqvrj.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Depending on what you want, you could also use a variation of&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  create table WANT as
    select * 
    from A 
  union all
    select * 
    from B 
  except 
   (select * from A 
    intersect 
    select * from B );
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 03 Mar 2025 11:09:44 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2025-03-03T11:09:44Z</dc:date>
    <item>
      <title>Finding observation difference in two separate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960696#M374639</link>
      <description>&lt;P&gt;I have two datasets. My original with 150,000 observations and a second one I made from this dataset with exclusions and has a total of 145,000 observations. I'm trying to figure out how to write a code where I can just subtracts these two datasets to&amp;nbsp; find the total amount excluded / missing. I know the total excluded is 5,000 from simple math but I'm a bit stuck on how we can do this in a&amp;nbsp; SAS code.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Mar 2025 06:25:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960696#M374639</guid>
      <dc:creator>coolbeans</dc:creator>
      <dc:date>2025-03-03T06:25:48Z</dc:date>
    </item>
    <item>
      <title>Re: Finding observation difference in two separate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960697#M374640</link>
      <description>Is there a key to identify matching observations?</description>
      <pubDate>Mon, 03 Mar 2025 06:46:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960697#M374640</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2025-03-03T06:46:58Z</dc:date>
    </item>
    <item>
      <title>Re: Finding observation difference in two separate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960698#M374641</link>
      <description>&lt;P&gt;Matching observations I guess would just be the unique ID?&lt;/P&gt;</description>
      <pubDate>Mon, 03 Mar 2025 06:50:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960698#M374641</guid>
      <dc:creator>coolbeans</dc:creator>
      <dc:date>2025-03-03T06:50:55Z</dc:date>
    </item>
    <item>
      <title>Re: Finding observation difference in two separate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960699#M374642</link>
      <description>&lt;P&gt;As long as these two datasets have the same variable name ,type,order , you could get it by using SQL.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data A;
set sashelp.class;
run;

data B;
set sashelp.class;
if _n_ in (1:4) then delete;
run;

proc sql;
create table want as
select * from A
except
select * from B ;
quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 03 Mar 2025 06:57:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960699#M374642</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2025-03-03T06:57:37Z</dc:date>
    </item>
    <item>
      <title>Re: Finding observation difference in two separate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960701#M374643</link>
      <description>&lt;P&gt;Please provide example data to illustrate your issue. I can't make code suggestions without knowing your data in structure and content.&lt;/P&gt;
&lt;P&gt;Post example data as working DATA steps with DATALINES in a code box.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Mar 2025 09:14:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960701#M374643</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2025-03-03T09:14:59Z</dc:date>
    </item>
    <item>
      <title>Re: Finding observation difference in two separate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960706#M374645</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;I know the total excluded is 5,000 from simple math but I'm a bit stuck on how we can do this in a&amp;nbsp; SAS code. &lt;BR /&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;You don't know that. Maybe there are duplicates, or unique observations in both data sets.&lt;/P&gt;
&lt;P&gt;Never make assumptions about your data quality, and always check for all possible defects.&lt;/P&gt;
&lt;P&gt;Assuming you won't rely on the observation order, here is one way:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data A;
  X=1; output;
  X=1; output;  
  X=2; output;
  X=2; output;
run;

data B;
  X=2; output;
  X=3; output;
run;

proc sql;
  create table WANT as 
  select 'In A only' as SRC, * 
  from (select * from A
        except all
        select * from B )
  union all
  select 'In B only' as SRC, * 
  from (select * from B
        except all
        select * from A );
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;This results in:&lt;/P&gt;
&lt;TABLE class="table" style="border-spacing: 0;" aria-label="Query Results"&gt;
&lt;THEAD&gt;
&lt;TR&gt;
&lt;TH class="b header" scope="col"&gt;SRC&lt;/TH&gt;
&lt;TH class="r b header" scope="col"&gt;X&lt;/TH&gt;
&lt;/TR&gt;
&lt;/THEAD&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD class="data"&gt;In A only&lt;/TD&gt;
&lt;TD class="r data"&gt;1&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="data"&gt;In A only&lt;/TD&gt;
&lt;TD class="r data"&gt;1&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="data"&gt;In A only&lt;/TD&gt;
&lt;TD class="r data"&gt;2 *&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD class="data"&gt;In B only&lt;/TD&gt;
&lt;TD class="r data"&gt;3&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;* The&amp;nbsp;&lt;CODE class=" language-sas"&gt;all&lt;/CODE&gt; keyword in &lt;CODE class=" language-sas"&gt;except&lt;/CODE&gt; does not suppress duplicate rows, hence why the second X=2 observation is flagged by the code above.&lt;/P&gt;
&lt;P&gt;Same thing for keeping both X=1 observations, thanks to &lt;CODE class=" language-sas"&gt;union all&lt;/CODE&gt;.&lt;/P&gt;
&lt;P&gt;See &lt;A href="https://documentation.sas.com/doc/en/sqlproc/3.2/n0vo2lglyrnexwn14emi8m0jqvrj.htm" target="_blank"&gt;https://documentation.sas.com/doc/en/sqlproc/3.2/n0vo2lglyrnexwn14emi8m0jqvrj.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Depending on what you want, you could also use a variation of&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  create table WANT as
    select * 
    from A 
  union all
    select * 
    from B 
  except 
   (select * from A 
    intersect 
    select * from B );
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Mar 2025 11:09:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960706#M374645</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2025-03-03T11:09:44Z</dc:date>
    </item>
    <item>
      <title>Re: Finding observation difference in two separate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960710#M374646</link>
      <description>&lt;P&gt;If you want to use the observation order and don't care about duplicates, here is another way to flag non-common&amp;nbsp; observations:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sort data=A out=ASORTED;
  by _ALL_;
run;
proc sort data=B out=BSORTED;
  by _ALL_;
run;
data WANT;
  merge ASORTED(in=A)
        BSORTED(in=B);
  by _ALL_;
  if       A &amp;amp; ^B then src='In A only';
  else if ^A &amp;amp;  B then src='In B only';
  else delete;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 03 Mar 2025 11:02:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960710#M374646</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2025-03-03T11:02:43Z</dc:date>
    </item>
    <item>
      <title>Re: Finding observation difference in two separate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960744#M374649</link>
      <description>&lt;P&gt;An example of the data would be&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;DATA exclusions;&lt;/P&gt;&lt;P&gt;set original;&lt;/P&gt;&lt;P&gt;if CVD = . THN DELETE;&lt;/P&gt;&lt;P&gt;RUN;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The original dataset would have this kind of data ;&lt;/P&gt;&lt;P&gt;DATA original;&lt;/P&gt;&lt;P&gt;input ID BLD STROKE&amp;nbsp; CVD ;datalines ;&lt;/P&gt;&lt;P&gt;1 A 1 .&lt;/P&gt;&lt;P&gt;2 B 0 1&lt;/P&gt;&lt;P&gt;3 O 0 0&lt;/P&gt;&lt;P&gt;4 A 1 . ;&lt;/P&gt;&lt;P&gt;RUN;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The excluded dataset would result in this data ;&lt;/P&gt;&lt;P&gt;DATA excluded ;&lt;/P&gt;&lt;P&gt;input ID BLD STROKE&amp;nbsp; CVD ;datalines ;&lt;/P&gt;&lt;P&gt;2 B 0 1&lt;/P&gt;&lt;P&gt;3 O 0 0&amp;nbsp;;&lt;/P&gt;&lt;P&gt;RUN;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So my goal is to figure out by code the difference in observations which should be 2.&lt;/P&gt;</description>
      <pubDate>Mon, 03 Mar 2025 16:07:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960744#M374649</guid>
      <dc:creator>coolbeans</dc:creator>
      <dc:date>2025-03-03T16:07:48Z</dc:date>
    </item>
    <item>
      <title>Re: Finding observation difference in two separate datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960745#M374650</link>
      <description>&lt;P&gt;This worked! Ty so much !&lt;/P&gt;</description>
      <pubDate>Mon, 03 Mar 2025 16:17:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Finding-observation-difference-in-two-separate-datasets/m-p/960745#M374650</guid>
      <dc:creator>coolbeans</dc:creator>
      <dc:date>2025-03-03T16:17:14Z</dc:date>
    </item>
  </channel>
</rss>

