BookmarkSubscribeRSS Feed
learn_SAS_23
Quartz | Level 8

Hi team, 

 

i am looking for any automated solution to comapare two large data which contain many variables , and print the results in excel by highliting the differences.

 

Thanks in advance.

11 REPLIES 11
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Proc compare is used for comparing two datasets.  This doesn't however give the output you want exactly so you will have to manipulate the output from that into what ever it is you want.

Another way is to do:

proc sql;
  create table out as
  select *
  from   table1
  except 
  select * 
  from   table2
  union all
  select * 
  from   table2
  except 
  select *
  from   table1;
quit;

Or something along those lines (this just shows rows which are different between the two, you would then need to write code to do highlighting and such like.

I would say proc compare output is the simplest to get and probably most detailed, but it isn't the output you want, which is very specific to you and hence probably no specific tool.

A final option, export to CSV and use a text compare tool maybe?  Seems a bit of a faff just to get some highlighted output though.

learn_SAS_23
Quartz | Level 8

Hi Team, 

 

we have almost 20 tables which is a SCD TYPE2 with more than 250 columns in each .i need to find the difference of old and new records in each table and high lisght the differences in excel.

 

 To break the requirement

 

 1. i took a back of table 1 before updating the table with new file let say old_table with 2 rows

 

Old_table : 

 

  id  name      city               processed_date

  1   Scott       sweden        26/07/2018

 2   Scott       America        04/07/2018

    

 2. After running a job , my table will get updated with new values which is type 2 SCD stores in different rows . So i copy the updated .

 

New_table : 

 

 id  name      city               processed_date

    1   Scott       Finland         18/09/2018

   2   Scott       America        04/07/2018

 

 

3. Now i rename the columns in old data such as (by using rename macro) 

 

Old_table : 

 

  id  name_0ld      city_old               processed_date_old

  1   Scott               sweden                  26/07/2018

 

4. Now i need to get my output , by merging old and new data set  with only mismath rows or updated  data with highlighted differences

 

 id     name_old  name      city_old           city              processed_date_old   processed_date

    1     Scott          Scott       Sweden           Finland          26/07/2018                     18/09/2018

 

i can achieve this solution by data merge if i have limited columns  , but i want to automate this for large data sets  which have multiple columns. 

 

Can some one help , with a generic code to compare and highlight the difference records  as descibed above.

 

Thanks in advance

 

learn_SAS_23
Quartz | Level 8
yes, i tried with proc compare The drawback of the procedure Proc Compare is its lengthy output if mismatches do exist. It is very
time consuming and labor extensive to find the information of interest, or pinpoint the exact changes in the dataset.
Reeza
Super User

I don't have a generic solution, but it's not that difficult to make one. If you can do it for one full data set and post that, you can then wrap it in a macro. 

 

The reason I don' thave a generic solution is the requirements change slightly every time. 

For example, I do have several generic macros that will report which variables are in which data sets which I more commonly need but for a dim change like this I would recommend a new macro.

 

Or you can parse PROC COMPARE. It really depends on what you want but there are several different OUT options that let you limit the output as desired.

 


@learn_SAS_23 wrote:
yes, i tried with proc compare The drawback of the procedure Proc Compare is its lengthy output if mismatches do exist. It is very
time consuming and labor extensive to find the information of interest, or pinpoint the exact changes in the dataset.

 

learn_SAS_23
Quartz | Level 8
Thanks for giving me such confidence , please see below comments and suggest your opinion
ballardw
Super User

@learn_SAS_23 wrote:

Hi Team, 

 

we have almost 20 tables which is a SCD TYPE2 with more than 250 columns in each .i need to find the difference of old and new records in each table and high lisght the differences in excel.

 

 To break the requirement

 

 1. i took a back of table 1 before updating the table with new file let say old_table with 2 rows

 

Old_table : 

 

  id  name      city               processed_date

  1   Scott       sweden        26/07/2018

 2   Scott       America        04/07/2018

    

 2. After running a job , my table will get updated with new values which is type 2 SCD stores in different rows . So i copy the updated .

 

New_table : 

 

 id  name      city               processed_date

    1   Scott       Finland         18/09/2018

   2   Scott       America        04/07/2018

 

 

3. Now i rename the columns in old data such as (by using rename macro) 

 

Old_table : 

 

  id  name_0ld      city_old               processed_date_old

  1   Scott               sweden                  26/07/2018

 

4. Now i need to get my output , by merging old and new data set  with only mismath rows or updated  data with highlighted differences

 

 id     name_old  name      city_old           city              processed_date_old   processed_date

    1     Scott          Scott       Sweden           Finland          26/07/2018                     18/09/2018

 

i can achieve this solution by data merge if i have limited columns  , but i want to automate this for large data sets  which have multiple columns. 

 

Can some one help , with a generic code to compare and highlight the difference records  as descibed above.

 

Thanks in advance

 


You say you have 20 data sets to compare. What do you expect this to look like after comparing all 20 of them?

It almost sounds like you would have 20 versions of each variable??

learn_SAS_23
Quartz | Level 8
i have 20 different data sets (of old and new data ) with different variables which contain approximetly 200 columns in each table , when i say table i had a old data in table_old_1 , and updated new data in table_new_1 , I need to merge the data and compare each and every column .

like i explained in my example

id name_old name city_old city processed_date_old processed_date
1 Scott Scott Sweden Finland 26/07/2018 18/09/2018

if i can use merge

data Merged;
merge Table_old_1 (in = x) Table_new_1(in = y);
by id;

if x = 1 and y = 1;
run;
proc sql;
create table MISMATCH as
select a.id,a.name_old,a.name_new,
a.city_old , b.city_new,
a.processed_date_old,b.processed_date_new,

from Merged a, Merged b
where a.id=b.id
and ( (a.name_old ne b.name_new) or
(a.city_old ne b.city_new) or
(a.processed_date_old ne b.processed_date_new) )
;quit;

in this code snippet i did for 3 columns, but i have to create a mismatch data set for 200 columns, can this can be done through macro , any suggestions please
Reeza
Super User

@learn_SAS_23 wrote:



in this code snippet i did for 3 columns, but i have to create a mismatch data set for 200 columns, can this can be done through macro , any suggestions please

Yes it can, but if you need this, PROC COMPARE really does a good job of doing that automatically. You have to do multiple checks though - one for if a record is present or not in the first place, one if it's different and what happens, if possible, if you have a key variable that is duplicated for some reason, which record would you check. 

 

Otherwise to write a macro I'd start with:

 

1. Get list of variables from data set, minus KEY joining variables.

2. Make a RENAME statement so that all variables are renamed automatically (add old/new prefix to one of the data sets). You show this in your example but you need to automate it.

3. Merge using the KEY variable.

4. Set up two/four arrays and loop through checking for differences, two if you have all variables the same type, 4 if you have both numeric and characters that you need to compare.

5. Print report of differences

 

Make this a macro using the Key variables to join on and an output data set name. 

learn_SAS_23
Quartz | Level 8
Thanks reeza,

Can you please send a sample code snippet for point number 4, using 4arrays
to verify data, am just learning sas


I made code for other points (1,2,3,5)
Reeza
Super User

 

data want;
set have;

array old_char (*) $ <list of character variables>;
array old_num(*) <list of numeric variables>;
array new_char(*) $ <list of new names>;
array new_num(*)  <list of new names>;

array diff_char(*) $ diffC1-diffCn; *you may want to make new names here;
array diff_num(*) diffN1-diffNn; 

*loop over character;
do i=1 to dim (old_char);
    diff_char(i) = old_char(i) = new_char(i);
end;

*loop over numeric;
do i=1 to dim (old_num);
    diff_num(i) = old_num(i) - new_num(i);
end;

run;

Sure, but please post back your macro so others can use it as well 😉

I ended up being wrong, you need 6 arrays, one for old/new/diff x 2 - one for character and numeric.

There should be an easier way. 

 

Here's a PROC COMPARE example. You could transpose this output data set and get what you want I think and it's easier to wrap that in a macro program IMO. 

 

data class_new;
set sashelp.class;
if age in (12, 13, 14) then height = height*2;
if age = 11 then delete;
run;

proc sort data=class_new;
by name; run;

proc sort data=sashelp.class out=class_old;
by name;run;

proc compare base=class_new compare=class_old outall out=want;
run;

@learn_SAS_23 wrote:
Thanks reeza,

Can you please send a sample code snippet for point number 4, using 4arrays
to verify data, am just learning sas


I made code for other points (1,2,3,5)

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1277 views
  • 0 likes
  • 5 in conversation