<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Comparing similar datasets and finding the difference between the two datasets in New SAS User</title>
    <link>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936622#M42092</link>
    <description>&lt;P&gt;Data step merge is going to be completely useless of you want to find different values of the variables other than BY variables because of the merge treats like-named variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Start with&lt;/P&gt;
&lt;PRE&gt;proc compare
 base = existing
 compare = revised&lt;BR /&gt; OUTNOEQUAL OUT=WORK.NOEQUAL&lt;BR /&gt;;
 by DM9_Consumer_ID;
 var Calculated_Balance_Non_Covid Calculated_Provision_Covid Calculated_Provision_CBIL 
     Calculated_Provision_Final collectvalcovid Impairment;
run;&lt;/PRE&gt;
&lt;P&gt;The output data set WORK.NOEQUAL will have the value of DM9_consumer_id and the Var variables with the difference of values or the special missing value that displays as E for equal values.&lt;/P&gt;
&lt;P&gt;The key part is that Work.Noequal has the By values that have at least one non-equal value in the VAR variables.&lt;/P&gt;
&lt;P&gt;If that is sufficient for your "inconsistent" then add a KEEP=DM9_consumer_id data set option to Work.Noequal.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 22 Jul 2024 16:28:44 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2024-07-22T16:28:44Z</dc:date>
    <item>
      <title>Comparing similar datasets and finding the difference between the two datasets</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936587#M42088</link>
      <description>&lt;P&gt;Hi Experts,&lt;/P&gt;
&lt;P&gt;I have two similar datasets which has most of the values as same but few of the values in some columns are different. I want to compare both of them and have only those rows which have those values as different. I have imported both the datasets and sorted it. Then I have merged the dataset based on the common variable i.e.,&amp;nbsp;DM9_Consumer_ID. I tried Proc Compare but it does the overall comparison and does not give the&amp;nbsp;DM9_Consumer_IDs which have inconsistency.&amp;nbsp; Can you please suggest what is the best way to find the&amp;nbsp;DM9_Consumer_ID which have data difference? Here are both the sample datasets:&lt;/P&gt;
&lt;DIV&gt;Data Existing;&lt;/DIV&gt;
&lt;DIV&gt;infile cards expandtabs;&lt;/DIV&gt;
&lt;DIV&gt;input DM9_Consumer_ID $ Calculated_Balance_Non_Covid Calculated_Provision_Covid Calculated_Provision_CBIL Calculated_Provision_Final collectvalcovid Impairment;&lt;/DIV&gt;
&lt;DIV&gt;datalines ;&lt;/DIV&gt;
&lt;DIV&gt;D0461 -59972.47 30617.98 30617.98 30617.98 57882.02 30617.98&lt;/DIV&gt;
&lt;DIV&gt;D0482 38926.17 24643.95 20899.13 20899.13 38926.17 41600.88&lt;/DIV&gt;
&lt;DIV&gt;D0645 16856.41 499.50 16491.03 16491.03 16856.41 31175.61&lt;/DIV&gt;
&lt;DIV&gt;D4146 45732.36 -103457.96 45732.36 45732.36 45732.36 86455.14&lt;/DIV&gt;
&lt;DIV&gt;D9007 7045.11 -15894.38 7045.11 7045.11 7045.11 13318.49&lt;/DIV&gt;
&lt;DIV&gt;D9012 24650.07 -28707.57 24650.07 24650.07 24650.07 46599.93&lt;/DIV&gt;
&lt;DIV&gt;D3024 8495.44 -24975.15 8495.44 8495.44 8495.44 16910.65&lt;/DIV&gt;
&lt;DIV&gt;D3063 7098.62 302.89 6795.73 6795.73 7098.62 12847.05&lt;/DIV&gt;
&lt;DIV&gt;;&lt;/DIV&gt;
&lt;DIV&gt;run;&lt;BR /&gt;&lt;BR /&gt;
&lt;DIV&gt;Data Revised;&lt;/DIV&gt;
&lt;DIV&gt;infile cards expandtabs;&lt;/DIV&gt;
&lt;DIV&gt;input DM9_Consumer_ID $ Calculated_Balance_Non_Covid Calculated_Provision_Covid Calculated_Provision_CBIL Calculated_Provision_Final collectvalcovid Impairment;&lt;/DIV&gt;
&lt;DIV&gt;datalines ;&lt;/DIV&gt;
&lt;DIV&gt;D0461 -59972.47 30617.98 30617.98 30617.98 57882.02 30617.98&lt;/DIV&gt;
&lt;DIV&gt;D0482 24643.95 0.00 0.00 18027.05 41600.88 18027.05&lt;/DIV&gt;
&lt;DIV&gt;D0645 499.50 0.00 0.00 365.38 31175.61 365.38&lt;/DIV&gt;
&lt;DIV&gt;D4146 -103457.96 45732.36 45732.36 45732.36 86455.14 45732.36&lt;/DIV&gt;
&lt;DIV&gt;D9007 -15894.38 7045.11 7045.11 7045.11 13318.49 7045.11&lt;/DIV&gt;
&lt;DIV&gt;D9012 -28707.57 24650.07 24650.07 24650.07 46599.93 24650.07&lt;/DIV&gt;
&lt;DIV&gt;D3024 -24975.15 8495.44 8495.44 8495.44 16910.65 8495.44&lt;/DIV&gt;
&lt;DIV&gt;D3063 302.89 6795.73 6795.73 7098.62 12847.05 7098.62&lt;/DIV&gt;
&lt;DIV&gt;;&lt;/DIV&gt;
&lt;DIV&gt;run;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Proc sort data=existing;
by DM9_Consumer_ID;
run;

Proc sort data=revised;
by DM9_Consumer_ID;
run;

Data merged;
merge existing (in=a) revised (in=b);
by DM9_Consumer_ID;
if a and b;
run;

/*Compare the difference*/
proc compare
 base = existing
 compare = revised;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Mon, 22 Jul 2024 13:51:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936587#M42088</guid>
      <dc:creator>Sandeep77</dc:creator>
      <dc:date>2024-07-22T13:51:15Z</dc:date>
    </item>
    <item>
      <title>Re: Comparing similar datasets and finding the difference between the two datasets</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936590#M42089</link>
      <description>&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Use 
PROC COMPARE statement options: 
NOPRINT
OUT=
OUTBASE
OUTCOMP
OUTDIF
OUTNOEQUAL 
OUTSTATS= ;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Koen&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 14:05:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936590#M42089</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2024-07-22T14:05:48Z</dc:date>
    </item>
    <item>
      <title>Re: Comparing similar datasets and finding the difference between the two datasets</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936610#M42090</link>
      <description>&lt;P&gt;Use the BY DM_Consumer_ID option in Proc Compare as well.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you are particularly interested in just a few particular variables use a VAR statement Proc Compare to reduce unwanted output.&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 15:23:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936610#M42090</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-07-22T15:23:53Z</dc:date>
    </item>
    <item>
      <title>Re: Comparing similar datasets and finding the difference between the two datasets</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936616#M42091</link>
      <description>They both are big dataset with 81 columns and 100k rows. I am only looking at the variables, I have mentioned in the sample datasets. Could you please elaborate your response? Thanks</description>
      <pubDate>Mon, 22 Jul 2024 16:01:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936616#M42091</guid>
      <dc:creator>Sandeep77</dc:creator>
      <dc:date>2024-07-22T16:01:49Z</dc:date>
    </item>
    <item>
      <title>Re: Comparing similar datasets and finding the difference between the two datasets</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936622#M42092</link>
      <description>&lt;P&gt;Data step merge is going to be completely useless of you want to find different values of the variables other than BY variables because of the merge treats like-named variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Start with&lt;/P&gt;
&lt;PRE&gt;proc compare
 base = existing
 compare = revised&lt;BR /&gt; OUTNOEQUAL OUT=WORK.NOEQUAL&lt;BR /&gt;;
 by DM9_Consumer_ID;
 var Calculated_Balance_Non_Covid Calculated_Provision_Covid Calculated_Provision_CBIL 
     Calculated_Provision_Final collectvalcovid Impairment;
run;&lt;/PRE&gt;
&lt;P&gt;The output data set WORK.NOEQUAL will have the value of DM9_consumer_id and the Var variables with the difference of values or the special missing value that displays as E for equal values.&lt;/P&gt;
&lt;P&gt;The key part is that Work.Noequal has the By values that have at least one non-equal value in the VAR variables.&lt;/P&gt;
&lt;P&gt;If that is sufficient for your "inconsistent" then add a KEEP=DM9_consumer_id data set option to Work.Noequal.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jul 2024 16:28:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936622#M42092</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-07-22T16:28:44Z</dc:date>
    </item>
    <item>
      <title>Re: Comparing similar datasets and finding the difference between the two datasets</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936702#M42093</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Data Existing;
infile cards expandtabs;
input DM9_Consumer_ID $ Calculated_Balance_Non_Covid Calculated_Provision_Covid Calculated_Provision_CBIL Calculated_Provision_Final collectvalcovid Impairment;
datalines ;
D0461 -59972.47 30617.98 30617.98 30617.98 57882.02 30617.98
D0482 38926.17 24643.95 20899.13 20899.13 38926.17 41600.88
D0645 16856.41 499.50 16491.03 16491.03 16856.41 31175.61
D4146 45732.36 -103457.96 45732.36 45732.36 45732.36 86455.14
D9007 7045.11 -15894.38 7045.11 7045.11 7045.11 13318.49
D9012 24650.07 -28707.57 24650.07 24650.07 24650.07 46599.93
D3024 8495.44 -24975.15 8495.44 8495.44 8495.44 16910.65
D3063 7098.62 302.89 6795.73 6795.73 7098.62 12847.05
;
run;

Data Revised;
infile cards expandtabs;
input DM9_Consumer_ID $ Calculated_Balance_Non_Covid Calculated_Provision_Covid Calculated_Provision_CBIL Calculated_Provision_Final collectvalcovid Impairment;
datalines ;
D0461 -59972.47 30617.98 30617.98 30617.98 57882.02 30617.98
D0482 24643.95 0.00 0.00 18027.05 41600.88 18027.05
D0645 499.50 0.00 0.00 365.38 31175.61 365.38
D4146 -103457.96 45732.36 45732.36 45732.36 86455.14 45732.36
D9007 -15894.38 7045.11 7045.11 7045.11 13318.49 7045.11
D9012 -28707.57 24650.07 24650.07 24650.07 46599.93 24650.07
D3024 -24975.15 8495.44 8495.44 8495.44 16910.65 8495.44
D3063 302.89 6795.73 6795.73 7098.62 12847.05 7098.62
;
run;







%let old_data= c:\temp\a  ;   *前一次数据 的路径;
%let new_data= c:\temp\b ;   *新数据 的路径;
libname old v9 "&amp;amp;old_data." ;
data old.have;
 set Existing;
run;
libname new v9 "&amp;amp;new_data." ;
data new.have;
 set Revised;
run;



%let subjid= DM9_Consumer_ID;   *受试者编号的变量名. 注意：所有的数据集中是一样的,且值是唯一的;
%let drop= UPDATE_TIME ;   *需要去掉的变量名;




options validvarname=any validmemname=extend mrecall nofmterr nonumber nodate dkricond=nowarn dkrocond=nowarn;

/*******比较 前一次数据 和 新数据********/
libname old v9 "&amp;amp;old_data." access=readonly;
libname new v9 "&amp;amp;new_data." access=readonly;

%macro type_one(idx=,dsn=,title=);
proc datasets library=work kill nolist nodetails;quit;
proc sort data=old.&amp;amp;dsn.(drop=&amp;amp;drop.) out=old;by  &amp;amp;subjid.  ;run;
proc sort data=new.&amp;amp;dsn.(drop=&amp;amp;drop.) out=new;by  &amp;amp;subjid.  ;run;
proc compare data=old compare=new noprint out=diff_&amp;amp;dsn.;
id &amp;amp;subjid.;
run;
data call_define;
 set diff_&amp;amp;dsn.(drop=_TYPE_ _OBS_);
 retain _dummy_ .;
 array n{*} _numeric_;
 array c{*} _character_;
 length vname $ 40;
 do i=1 to dim(n);
   if n{i} not in (. 0) then do;vname=vname(n{i}); if upcase(vname) not in ("%upcase(&amp;amp;subjid.)") then output;end;
 end;
 do i=1 to dim(c);
   if findc(c{i},'X') then do;vname=vname(c{i}); if upcase(vname) not in ("%upcase(&amp;amp;subjid.)") then output;end;
 end;
keep &amp;amp;subjid. vname;
run;
filename t temp;
data _null_;
 set call_define;
 file t;
/* put 'if &amp;amp;subjid.="' &amp;amp;subjid. '" then call define("' vname '","style","style={background=yellow}");';*/
 put "if &amp;amp;subjid.='" &amp;amp;subjid. "' then call define('" vname "','style','style={background=yellow}');";
run;
data &amp;amp;dsn.;
if _n_=1 then do;
 length new_change $ 40;
 if 0 then set old.&amp;amp;dsn.;
 declare hash h(dataset:"old.&amp;amp;dsn.");
 h.definekey("&amp;amp;subjid.");
 h.definedone();

 declare hash h1(dataset:'call_define');
 h1.definekey("&amp;amp;subjid.");
 h1.definedone();
end;
 set new.&amp;amp;dsn.;
 new_change='old' ;
 if h.check() ne 0 then new_change='new' ;
 if h1.check() = 0 then new_change='change' ;
label new_change='数据状态';
run;
/*ods excel file="&amp;amp;out.\&amp;amp;idx..&amp;amp;dsn..xlsx" options(sheet_name="&amp;amp;dsn." autofilter='all' ) ;*/
proc report data=&amp;amp;dsn. split='*'  nowd;
column _all_ dummy;
define _all_/display;
define dummy/computed noprint;
compute dummy;
%include t;
endcomp;
run;
/*ods excel close;*/
%mend;



%type_one(idx=1,dsn=have,title=标题)&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Ksharp_0-1721700352309.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/98614i8FA09BA14291767A/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Ksharp_0-1721700352309.png" alt="Ksharp_0-1721700352309.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Jul 2024 02:07:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936702#M42093</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2024-07-23T02:07:39Z</dc:date>
    </item>
    <item>
      <title>Re: Comparing similar datasets and finding the difference between the two datasets</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936726#M42094</link>
      <description>Thanks for the detailed information. But what is libname old v9 "&amp;amp;old_data."  and New Libname? When I try to run the data step after the new and Old linrary, it shows ERROR: Library OLD does not exist. Could you please explain? Thanks</description>
      <pubDate>Tue, 23 Jul 2024 08:52:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936726#M42094</guid>
      <dc:creator>Sandeep77</dc:creator>
      <dc:date>2024-07-23T08:52:36Z</dc:date>
    </item>
    <item>
      <title>Re: Comparing similar datasets and finding the difference between the two datasets</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936730#M42095</link>
      <description>For the comparing these two tables:&lt;BR /&gt;I put table “Existing” under path " c:\temp\a " ,and put table “Revised” under path " c:\temp\b " .&lt;BR /&gt;If you do not have these two directory, you should firstly create them manually .  &lt;BR /&gt;And You see I have these two table the same name 'have' to make my code to run.</description>
      <pubDate>Tue, 23 Jul 2024 09:04:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936730#M42095</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2024-07-23T09:04:43Z</dc:date>
    </item>
    <item>
      <title>Re: Comparing similar datasets and finding the difference between the two datasets</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936735#M42098</link>
      <description>Apologies as it is still not clear to me. I have changed the old_data and New_data file path to my existing files.&lt;BR /&gt;%let old_data= c:\Users\56029\Desktop\Compare\Existing.xlsx  ;   *Existing data path;&lt;BR /&gt;%let new_data= c:\Users\56029\Desktop\Compare\Revised.xlsx ;   *Revised data path;&lt;BR /&gt;Are you referring to change the old and new data path to where the actual files are?</description>
      <pubDate>Tue, 23 Jul 2024 09:14:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936735#M42098</guid>
      <dc:creator>Sandeep77</dc:creator>
      <dc:date>2024-07-23T09:14:40Z</dc:date>
    </item>
    <item>
      <title>Re: Comparing similar datasets and finding the difference between the two datasets</title>
      <link>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936737#M42099</link>
      <description>&lt;P&gt;You could see I compare two SAS TABLES, not EXCEL file, I think you should import these excel file into sas firstly.&lt;BR /&gt;And %let old_data= is a path/directory/folder , NOT a file.&lt;BR /&gt;So is "%let new_data= ".&lt;BR /&gt;&lt;BR /&gt;Here is an example:&lt;BR /&gt;%let old_data= c:\Users\56029\Desktop\Compare1\ ; &lt;BR /&gt;%let new_data= c:\Users\56029\Desktop\Compare2\ ;&lt;BR /&gt;&lt;BR /&gt;Create two folders "Compare1\" and "Compare2\" by hand, if you do not have under your OS.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Jul 2024 01:50:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/New-SAS-User/Comparing-similar-datasets-and-finding-the-difference-between/m-p/936737#M42099</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2024-07-24T01:50:19Z</dc:date>
    </item>
  </channel>
</rss>

