Hello, I have the following issue: I have 2 tables that contain just the same data, but calculated by different algorithms (i.e. 2 persons are watching for cars: identity no, colour, speed) having one key (identity no from the example) and i have to find the lines which have any differences in observations.
The issue's that the tables are 300 colomns x 1m lines.
The straightforward algorithm is:
/*Preparing sample tables*/
data person1;
input identity, colour $50., speed;
datalines;
1 red 10
2 blue 15
3 green 30
1 red 15
;
run;
data person2;
input identity, colour $50., speed;
datalines;
1 red 10
2 white 15
3 green 2
1 red 15
;
run;
/*I suppose algorithm to show observervations with identies 2 and 3 are mismatching (2 - colour, 3 - speed).*/
proc sql;
create table join as
select t1.*, t2.colour as colour_2, t2.speed as speed_2 from
(select * from person1) t1
full join
(select * from person2) t2
on t1.identity = t2.identity;
quit;
/*Data step is used to add reason field*/
data mismatch;
format reason $50.;
set join;
if colour ne colour_2 then do; reson ='colour'; output; end;
if speed ne speed_2 then do; reson ='speed'; output; end;
run;
The issue is - I have 300 columns, so "t2.column as column_2" section in proc sql and "if column ne column_2" in data step are quite hude - is there any more simple solution?
And i have to full join two 300x1m tables, is there any way-around solution?
Thank you greatly, but PROC COMPARE is not the very thing I need (or more likely I'm not good at reading help):
proc compare base=person1 compare=person2 out=compare;
id identity;
run;
would give me the table containing not only differing values but both equal values, and having 300 columns it would be hard to find the differing "XXXXX" for chars and differing value for numeric.
More over the values that present only in one table would not be indicated in the table - only the overall mismatching amount in output like
Number of Observations in Common: 4.
Number of Observations in WORK.B but not in WORK.A: 1.
Total Number of Observations Read from WORK.A: 4.
Total Number of Observations Read from WORK.B: 5.
while proc sql
data mismatch;
format reason $50.;
set join;
reason = '';
if colour ne colour_2 or speed ne speed_2;
if colour ne colour_2 then reason = cat('colour',' ', reason);
if speed ne speed_2 then reason =cat('speed',' ', reason);
run;
would filter out only mismatching lines, highlighting mismatching variables and showing lines that exist only in one table:
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.