<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic compare two similar dataset in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333222#M75049</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have requirement to compare two monthly extracts which has same varibales, but values may be changed between two datasets. I need to extract observations that has different value in nay varibale compare to&amp;nbsp;last month dataset. For example below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;DATA&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; SasConf;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;INFILE&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;DATALINES&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;INPUT&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; ConfName $&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;ConfYear&lt;/P&gt;&lt;P&gt;ConfCity $&lt;/P&gt;&lt;P&gt;ConfST $ ;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;DATALINES4&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;SUGI 2006 San Francisco CA&lt;/P&gt;&lt;P&gt;PHARMASUG 2006 Bonita Springs FL&lt;/P&gt;&lt;P&gt;NESUG 2006 newyork NY&lt;/P&gt;&lt;P&gt;WUSS 2006 Irvine CA&lt;/P&gt;&lt;P&gt;SESUG 2006 Atlanta GA&lt;/P&gt;&lt;P&gt;SCSUG 2006 Irving TX&lt;/P&gt;&lt;P&gt;MWSUG 2006 Dearborn MI&lt;/P&gt;&lt;P&gt;SUGI 2007 Orlando FL&lt;/P&gt;&lt;P&gt;;;;;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;DATA&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; SasConf2;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;INFILE&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;DATALINES&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;INPUT&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; ConfName $&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;ConfYear&lt;/P&gt;&lt;P&gt;ConfCity $&lt;/P&gt;&lt;P&gt;ConfST $ ;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;DATALINES4&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;SUGI 2006 San Francisco CA&lt;/P&gt;&lt;P&gt;PHARMASUG 2006 Bonita Springs FL&lt;/P&gt;&lt;P&gt;NESUG 2006 Philadelphia PA&lt;/P&gt;&lt;P&gt;WUSS 2006 Irvine CA&lt;/P&gt;&lt;P&gt;SESUG 2006 Atlanta GA&lt;/P&gt;&lt;P&gt;SCSUG 2006 Irving TX&lt;/P&gt;&lt;P&gt;PNWSUG 2006 Seaside OR&lt;/P&gt;&lt;P&gt;NESUGI 2007 Orlando FL&lt;/P&gt;&lt;P&gt;;;;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Compare these two datasets and the output datset should have following obervations:&lt;/P&gt;&lt;P&gt;NESUG 2006 Philadelphia PA&lt;/P&gt;&lt;P&gt;NESUGI 2007 Orlando FL&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Note: there is no primary key in the dataset, so any varibale value&amp;nbsp;can be different, so need to select the observation which is changed from last month and also ignore any new observation in the current month dataset.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for all your help.&lt;/P&gt;</description>
    <pubDate>Wed, 15 Feb 2017 23:22:25 GMT</pubDate>
    <dc:creator>cho16</dc:creator>
    <dc:date>2017-02-15T23:22:25Z</dc:date>
    <item>
      <title>compare two similar dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333222#M75049</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I have requirement to compare two monthly extracts which has same varibales, but values may be changed between two datasets. I need to extract observations that has different value in nay varibale compare to&amp;nbsp;last month dataset. For example below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#000080" face="Courier New" size="3"&gt;&lt;STRONG&gt;DATA&lt;/STRONG&gt;&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; SasConf;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;INFILE&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;DATALINES&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;INPUT&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; ConfName $&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;ConfYear&lt;/P&gt;&lt;P&gt;ConfCity $&lt;/P&gt;&lt;P&gt;ConfST $ ;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;DATALINES4&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;SUGI 2006 San Francisco CA&lt;/P&gt;&lt;P&gt;PHARMASUG 2006 Bonita Springs FL&lt;/P&gt;&lt;P&gt;NESUG 2006 newyork NY&lt;/P&gt;&lt;P&gt;WUSS 2006 Irvine CA&lt;/P&gt;&lt;P&gt;SESUG 2006 Atlanta GA&lt;/P&gt;&lt;P&gt;SCSUG 2006 Irving TX&lt;/P&gt;&lt;P&gt;MWSUG 2006 Dearborn MI&lt;/P&gt;&lt;P&gt;SUGI 2007 Orlando FL&lt;/P&gt;&lt;P&gt;;;;;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;DATA&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; SasConf2;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;INFILE&lt;/FONT&gt; &lt;FONT color="#0000ff" face="Courier New" size="3"&gt;DATALINES&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;INPUT&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt; ConfName $&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;ConfYear&lt;/P&gt;&lt;P&gt;ConfCity $&lt;/P&gt;&lt;P&gt;ConfST $ ;&lt;/P&gt;&lt;P&gt;&lt;FONT color="#0000ff" face="Courier New" size="3"&gt;DATALINES4&lt;/FONT&gt;&lt;FONT face="Courier New" size="3"&gt;;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;SUGI 2006 San Francisco CA&lt;/P&gt;&lt;P&gt;PHARMASUG 2006 Bonita Springs FL&lt;/P&gt;&lt;P&gt;NESUG 2006 Philadelphia PA&lt;/P&gt;&lt;P&gt;WUSS 2006 Irvine CA&lt;/P&gt;&lt;P&gt;SESUG 2006 Atlanta GA&lt;/P&gt;&lt;P&gt;SCSUG 2006 Irving TX&lt;/P&gt;&lt;P&gt;PNWSUG 2006 Seaside OR&lt;/P&gt;&lt;P&gt;NESUGI 2007 Orlando FL&lt;/P&gt;&lt;P&gt;;;;;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Compare these two datasets and the output datset should have following obervations:&lt;/P&gt;&lt;P&gt;NESUG 2006 Philadelphia PA&lt;/P&gt;&lt;P&gt;NESUGI 2007 Orlando FL&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Note: there is no primary key in the dataset, so any varibale value&amp;nbsp;can be different, so need to select the observation which is changed from last month and also ignore any new observation in the current month dataset.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you for all your help.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Feb 2017 23:22:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333222#M75049</guid>
      <dc:creator>cho16</dc:creator>
      <dc:date>2017-02-15T23:22:25Z</dc:date>
    </item>
    <item>
      <title>Re: compare two similar dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333225#M75051</link>
      <description>Have you looked at PROC COMPARE?</description>
      <pubDate>Wed, 15 Feb 2017 23:29:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333225#M75051</guid>
      <dc:creator>Peter_C</dc:creator>
      <dc:date>2017-02-15T23:29:54Z</dc:date>
    </item>
    <item>
      <title>Re: compare two similar dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333246#M75055</link>
      <description>&lt;P&gt;I don't see how this will be possible, if there is no key in the datasets.&amp;nbsp; You say you want to select observations with changed values, and ignore new observations.&amp;nbsp; If there is no key, how will you tell the difference between a changed observation and a new observation?&lt;/P&gt;</description>
      <pubDate>Thu, 16 Feb 2017 01:00:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333246#M75055</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2017-02-16T01:00:31Z</dc:date>
    </item>
    <item>
      <title>Re: compare two similar dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333251#M75059</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If what you want are records that are not completely matched in the two datasets, you could do the following program.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It only assumes that both dataset A and B have exactly the same variables.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It will create two datasets: a_only and b_only.&amp;nbsp;&amp;nbsp; But it does NOT tell you which b_only record is most likely to be associated with an a_only record.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data a_only (drop=rc);&lt;/P&gt;
&lt;P&gt;&amp;nbsp; set a end=end_of_a;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; if _n_=1 then do;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; declare hash b (dataset:'b');&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; b.definekey(all:'Y');&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;b.definedata(all:'Y');&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; b.definedone();&lt;/P&gt;
&lt;P&gt;&amp;nbsp; end;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; rc=b.find();&lt;/P&gt;
&lt;P&gt;&amp;nbsp; if rc=0 then rc=b.remove();&amp;nbsp; /* It's a match: don't output A and remove from hash b*/&lt;/P&gt;
&lt;P&gt;&amp;nbsp; else output a_only;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; if end_of_a then rc=b.output(dataset:'bonly');&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Edit note:&amp;nbsp; Also I think there is an "except" operator in proc sql that could provide the unmatched records.&lt;/P&gt;</description>
      <pubDate>Thu, 16 Feb 2017 01:58:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333251#M75059</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2017-02-16T01:58:36Z</dc:date>
    </item>
    <item>
      <title>Re: compare two similar dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333257#M75062</link>
      <description>&lt;PRE&gt;
That is very complicated problem.
You said there are not key variables, how can you measure the similarity  between these two string.

Why would you select
NESUGI 2007 Orlando FL

I think it is a new obs ,since it isn't included in the first table.
&lt;/PRE&gt;</description>
      <pubDate>Thu, 16 Feb 2017 02:59:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333257#M75062</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2017-02-16T02:59:48Z</dc:date>
    </item>
    <item>
      <title>Re: compare two similar dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333260#M75065</link>
      <description>&lt;P&gt;Even so, it probably makes a lot of sense to first eliminate exactly&amp;nbsp;matched records.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then,&amp;nbsp;with the (hopefully) small set of unmatched records, do some sort of spelling distance or general editting distance between them.&amp;nbsp; Or one could count the number/proportion of common words in every paired comparison.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Of course, this presumes that the op only expects 1 to 1 matches.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Feb 2017 03:08:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333260#M75065</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2017-02-16T03:08:15Z</dc:date>
    </item>
    <item>
      <title>Re: compare two similar dataset</title>
      <link>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333498#M75144</link>
      <description>&lt;P&gt;I would have started with&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;proc sql;
   create table want as
   select * 
   from sasconf2
   except
   select * from sasconf 
   ;
quit;
&lt;/PRE&gt;
&lt;P&gt;to show me everything in the second table not in the first. Which shows new records as well as changes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 16 Feb 2017 17:35:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/compare-two-similar-dataset/m-p/333498#M75144</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-02-16T17:35:58Z</dc:date>
    </item>
  </channel>
</rss>

