Solved: Re: Comparison of 2 tables

i_Van · Posted 07-22-2019 11:51 AM

Hello, I have the following issue: I have 2 tables that contain just the same data, but calculated by different algorithms (i.e. 2 persons are watching for cars: identity no, colour, speed) having one key (identity no from the example) and i have to find the lines which have any differences in observations.

The issue's that the tables are 300 colomns x 1m lines.

The straightforward algorithm is:

/*Preparing sample tables*/
data person1;
input identity, colour $50., speed;
datalines;
1 red 10
2 blue 15
3 green 30
1 red 15
;
run;

data person2;
input identity, colour $50., speed;
datalines;
1 red 10
2 white 15
3 green 2
1 red 15
;
run;

/*I suppose algorithm to show observervations with identies 2 and 3 are mismatching (2 - colour, 3 - speed).*/

proc sql;
create table join as
select t1.*, t2.colour as colour_2, t2.speed as speed_2 from
(select * from person1) t1
full join
(select * from person2) t2
on t1.identity = t2.identity;
quit;

/*Data step is used to add reason field*/
data mismatch;
format reason $50.;
set join;
if colour ne colour_2 then do; reson ='colour'; output; end;
if speed ne speed_2 then do; reson ='speed'; output; end;
run;

The issue is - I have 300 columns, so "t2.column as column_2" section in proc sql and "if column ne column_2" in data step are quite hude - is there any more simple solution?

And i have to full join two 300x1m tables, is there any way-around solution?

Reeza · Posted 07-22-2019 12:08 PM

PROC COMPARE - test it on subsets first to ensure you can get the output you need.

View solution in original post

Reeza · Posted 07-22-2019 12:08 PM

PROC COMPARE - test it on subsets first to ensure you can get the output you need.

i_Van · Posted 07-23-2019 06:07 AM

Thank you greatly, but PROC COMPARE is not the very thing I need (or more likely I'm not good at reading help):

proc compare base=person1 compare=person2 out=compare; 
id identity; 
run;

would give me the table containing not only differing values but both equal values, and having 300 columns it would be hard to find the differing "XXXXX" for chars and differing value for numeric.

More over the values that present only in one table would not be indicated in the table - only the overall mismatching amount in output like

Number of Observations in Common: 4.
Number of Observations in WORK.B but not in WORK.A: 1.
Total Number of Observations Read from WORK.A: 4.
Total Number of Observations Read from WORK.B: 5.

while proc sql

data mismatch;
format reason $50.;
set join;
reason = '';
if colour ne colour_2 or speed ne speed_2;
if colour ne colour_2 then  reason = cat('colour',' ', reason); 
if speed ne speed_2 then reason =cat('speed',' ', reason); 
run;

would filter out only mismatching lines, highlighting mismatching variables and showing lines that exist only in one table:

Reeza · Posted 07-23-2019 11:00 AM

I thought it did show where lines do differ? Did you check the options available to control the output on the PROC COMPARE statement?

It's most definitely not perfect, but I think in your case, it does do what you asked.

Comparison of 2 tables

Re: Comparison of 2 tables

Re: Comparison of 2 tables

Re: Comparison of 2 tables

Re: Comparison of 2 tables

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away