<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Compare two datasets by each row each variable in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Compare-two-datasets-by-each-row-each-variable/m-p/448111#M69567</link>
    <description>&lt;P&gt;yes there are some limitations to proc compare in this case. The doc states&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My alternative would be:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data PREV_PERIOD;
	input RecID Age Income Region;
	cards;
1001   23  1000   1
1002   54  2500   3
1003   41  8000   5
1004   47  4500   4
1005   21  5000   2
;

data CUR_PERIOD;
	input RecID Age Income Region;
	cards;
1001   23   1000   1
1002   54   3500   3
1003   41   8000   5
1004   47   4500   3
1005   22   5000   2
1006   36   9000   1
1007   60   5000   4
;

data compare;
	merge prev_period(in=in_p rename=(age=age_p income=income_p region=region_p)) cur_period(in=in_c);
	length flag $10;
	by RecID;
	keep RecID Age Income Region Flag;

	hash_p=md5(cats(Age_p,Income_p,Region_p));
	hash_c=md5(cats(Age,Income,Region));

	if (in_p and in_c) then do;
		if (hash_p ^= hash_c) then do;
			flag='Updated';
			output;
		end;
	end;
	else if (in_c and not in_p) then do;
		flag='New';
		output;
	end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Due to the use of a BY statement the requirement for sorting or indexing remains.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This code could also be made to flag deleted rows. I use the MD5() function for convenience. You could do without; I just don't like longwinding IF conditions with lots of AND's in them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps,&lt;/P&gt;
&lt;P&gt;- Jan.&lt;/P&gt;</description>
    <pubDate>Fri, 23 Mar 2018 12:39:58 GMT</pubDate>
    <dc:creator>jklaverstijn</dc:creator>
    <dc:date>2018-03-23T12:39:58Z</dc:date>
    <item>
      <title>Compare two datasets by each row each variable</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Compare-two-datasets-by-each-row-each-variable/m-p/448058#M69563</link>
      <description>&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need to compare 2 datasets containing info by unique RecId for two different periods.&lt;/P&gt;&lt;P&gt;My requirements are -&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;gt; Flag record as "Updated" if any of the variable is changed compared to previous period data.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;gt; Flag record as "New" if any new record is added in the current period data.&lt;/P&gt;&lt;P&gt;Sample Datasets -&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;PREV_PERIOD&lt;/STRONG&gt;&lt;BR /&gt;RecID Age Income Region&lt;BR /&gt;1001&amp;nbsp; &amp;nbsp;23&amp;nbsp; 1000&amp;nbsp; &amp;nbsp;1&lt;BR /&gt;1002&amp;nbsp; &amp;nbsp;54&amp;nbsp; 2500&amp;nbsp; &amp;nbsp;3&lt;BR /&gt;1003&amp;nbsp; &amp;nbsp;41&amp;nbsp; 8000&amp;nbsp; &amp;nbsp;5&lt;BR /&gt;1004&amp;nbsp; &amp;nbsp;47&amp;nbsp; 4500&amp;nbsp; &amp;nbsp;4&lt;BR /&gt;1005&amp;nbsp; &amp;nbsp;21&amp;nbsp; 5000&amp;nbsp; &amp;nbsp;2&lt;BR /&gt;&lt;BR /&gt;&lt;STRONG&gt;CUR_PERIOD&lt;/STRONG&gt;&lt;BR /&gt;RecID Age Income Region&lt;BR /&gt;1001&amp;nbsp; &amp;nbsp;23&amp;nbsp; &amp;nbsp;1000&amp;nbsp; &amp;nbsp;1&lt;BR /&gt;1002&amp;nbsp; &amp;nbsp;54&amp;nbsp; &amp;nbsp;3500&amp;nbsp; &amp;nbsp;3&lt;BR /&gt;1003&amp;nbsp; &amp;nbsp;41&amp;nbsp; &amp;nbsp;8000&amp;nbsp; &amp;nbsp;5&lt;BR /&gt;1004&amp;nbsp; &amp;nbsp;47&amp;nbsp; &amp;nbsp;4500&amp;nbsp; &amp;nbsp;3&lt;BR /&gt;1005&amp;nbsp; &amp;nbsp;22&amp;nbsp; &amp;nbsp;5000&amp;nbsp; &amp;nbsp;2&lt;BR /&gt;1006&amp;nbsp; &amp;nbsp;36&amp;nbsp; &amp;nbsp;9000&amp;nbsp; &amp;nbsp;1&lt;BR /&gt;1007&amp;nbsp; &amp;nbsp;60&amp;nbsp; &amp;nbsp;5000&amp;nbsp; &amp;nbsp;4&lt;/P&gt;</description>
      <pubDate>Fri, 23 Mar 2018 06:29:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Compare-two-datasets-by-each-row-each-variable/m-p/448058#M69563</guid>
      <dc:creator>pateki01</dc:creator>
      <dc:date>2018-03-23T06:29:18Z</dc:date>
    </item>
    <item>
      <title>Re: Compare two datasets by each row each variable</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Compare-two-datasets-by-each-row-each-variable/m-p/448077#M69564</link>
      <description>&lt;P&gt;Proc compare is just the tool for you. Provided yo give it the proper options it will compare two datasets on a row by row basis or matching on key variables. There are options to output result rows that are flagged with the type of difference found.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I will post and update with example code for the data you provided in a minute.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;- Jan.&lt;/P&gt;</description>
      <pubDate>Fri, 23 Mar 2018 08:06:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Compare-two-datasets-by-each-row-each-variable/m-p/448077#M69564</guid>
      <dc:creator>jklaverstijn</dc:creator>
      <dc:date>2018-03-23T08:06:46Z</dc:date>
    </item>
    <item>
      <title>Re: Compare two datasets by each row each variable</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Compare-two-datasets-by-each-row-each-variable/m-p/448083#M69565</link>
      <description>&lt;P&gt;So here is some code that would compare your datasets:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC COMPARE BASE=WORK.Base_byidnum COMPARE=WORK.Compare_byidnum
	METHOD=ABSOLUTE
	OUT=WORK.COMP(LABEL="Compare Data for PREV_PERIOD and CUR_PERIOD")
	OUTBASE
	OUTCOMP
	OUTDIF
	OUTNOEQUAL
	MAXPRINT=50;
	ID RecID;
RUN;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do consider that your input must be sorted or indexed on RecID for this to work. I prototyped this code with Enterprise Guide that does the sorting automatically for you.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Proc compare is very versatile in what it outputs to datasets and to a report. Have a look at the docs to find out what else there is.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Of course as always there are other ways (datastep, SQL) but this is my $0.02.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps,&lt;/P&gt;
&lt;P&gt;- Jan.&lt;/P&gt;</description>
      <pubDate>Fri, 23 Mar 2018 08:26:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Compare-two-datasets-by-each-row-each-variable/m-p/448083#M69565</guid>
      <dc:creator>jklaverstijn</dc:creator>
      <dc:date>2018-03-23T08:26:04Z</dc:date>
    </item>
    <item>
      <title>Re: Compare two datasets by each row each variable</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Compare-two-datasets-by-each-row-each-variable/m-p/448102#M69566</link>
      <description>&lt;P&gt;Thanks Jan for the prompt response!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Comparing two input datasets which I have posted above, I need out as follow-&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;STRONG&gt;Output&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;RecID Age Income Region Flag&lt;/DIV&gt;&lt;DIV&gt;1002&amp;nbsp; &amp;nbsp;54&amp;nbsp; &amp;nbsp; &amp;nbsp;3500&amp;nbsp; &amp;nbsp; &amp;nbsp;3&amp;nbsp; &amp;nbsp; &amp;nbsp;Updated&lt;/DIV&gt;&lt;DIV&gt;1004&amp;nbsp; &amp;nbsp;47&amp;nbsp; &amp;nbsp; &amp;nbsp;4500&amp;nbsp; &amp;nbsp; &amp;nbsp;3&amp;nbsp; &amp;nbsp; &amp;nbsp;Updated&lt;/DIV&gt;&lt;DIV&gt;1005&amp;nbsp; &amp;nbsp;22&amp;nbsp; &amp;nbsp; &amp;nbsp;5000&amp;nbsp; &amp;nbsp; &amp;nbsp;2&amp;nbsp; &amp;nbsp; &amp;nbsp;Updated&lt;/DIV&gt;&lt;DIV&gt;1006&amp;nbsp; &amp;nbsp;36&amp;nbsp; &amp;nbsp; &amp;nbsp;9000&amp;nbsp; &amp;nbsp; &amp;nbsp;1&amp;nbsp; &amp;nbsp; &amp;nbsp;New&lt;/DIV&gt;&lt;DIV&gt;1007&amp;nbsp; &amp;nbsp;60&amp;nbsp; &amp;nbsp; &amp;nbsp;5000&amp;nbsp; &amp;nbsp; &amp;nbsp;4&amp;nbsp; &amp;nbsp; &amp;nbsp;New&lt;/DIV&gt;</description>
      <pubDate>Fri, 23 Mar 2018 12:06:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Compare-two-datasets-by-each-row-each-variable/m-p/448102#M69566</guid>
      <dc:creator>pateki01</dc:creator>
      <dc:date>2018-03-23T12:06:37Z</dc:date>
    </item>
    <item>
      <title>Re: Compare two datasets by each row each variable</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Compare-two-datasets-by-each-row-each-variable/m-p/448111#M69567</link>
      <description>&lt;P&gt;yes there are some limitations to proc compare in this case. The doc states&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My alternative would be:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data PREV_PERIOD;
	input RecID Age Income Region;
	cards;
1001   23  1000   1
1002   54  2500   3
1003   41  8000   5
1004   47  4500   4
1005   21  5000   2
;

data CUR_PERIOD;
	input RecID Age Income Region;
	cards;
1001   23   1000   1
1002   54   3500   3
1003   41   8000   5
1004   47   4500   3
1005   22   5000   2
1006   36   9000   1
1007   60   5000   4
;

data compare;
	merge prev_period(in=in_p rename=(age=age_p income=income_p region=region_p)) cur_period(in=in_c);
	length flag $10;
	by RecID;
	keep RecID Age Income Region Flag;

	hash_p=md5(cats(Age_p,Income_p,Region_p));
	hash_c=md5(cats(Age,Income,Region));

	if (in_p and in_c) then do;
		if (hash_p ^= hash_c) then do;
			flag='Updated';
			output;
		end;
	end;
	else if (in_c and not in_p) then do;
		flag='New';
		output;
	end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Due to the use of a BY statement the requirement for sorting or indexing remains.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This code could also be made to flag deleted rows. I use the MD5() function for convenience. You could do without; I just don't like longwinding IF conditions with lots of AND's in them.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps,&lt;/P&gt;
&lt;P&gt;- Jan.&lt;/P&gt;</description>
      <pubDate>Fri, 23 Mar 2018 12:39:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Compare-two-datasets-by-each-row-each-variable/m-p/448111#M69567</guid>
      <dc:creator>jklaverstijn</dc:creator>
      <dc:date>2018-03-23T12:39:58Z</dc:date>
    </item>
    <item>
      <title>Re: Compare two datasets by each row each variable</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Compare-two-datasets-by-each-row-each-variable/m-p/448371#M69590</link>
      <description>&lt;P&gt;It may be that PROC COMPARE won't directly produce what you want.&amp;nbsp; But if both your old and new datasets are sorted by ID, this program will:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data need/view=need;
  set old new;
  by id;
  retain _sentinel .;
run;

data want (drop=_sentinel);
  set need;
  by id -- _sentinel notsorted;
  if last.id;
  if first._sentinel then status='New';
  else status='Upd';
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This program assumes the same number of cases both in dataset OLD and NEW.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Although there are 2 data steps, this program passes through the data only once, since the first DATA step is a data set VIEW, not a data set FILE.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It also assumes that ID is the leftmost variable in the OLD dataset.&amp;nbsp; If that's not the case,&amp;nbsp;then just force it to the left in the DATA NEED step:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data need/view=need;
  retain id;
  set old new;
  by id;
  retain _sentinel .;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 23 Mar 2018 23:24:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Compare-two-datasets-by-each-row-each-variable/m-p/448371#M69590</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-03-23T23:24:54Z</dc:date>
    </item>
  </channel>
</rss>

