BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
i_Van
Fluorite | Level 6

Hello, I have the following issue: I have 2 tables that contain just the same data, but calculated by different algorithms (i.e. 2 persons are watching for cars: identity no, colour, speed) having one key (identity no from the example) and i have to find the lines which have any differences in observations.

The issue's that the tables are 300 colomns x 1m lines.

The straightforward algorithm is:

 

/*Preparing sample tables*/
data person1;
input identity, colour $50., speed;
datalines;
1 red 10
2 blue 15
3 green 30
1 red 15
;
run; data person2; input identity, colour $50., speed; datalines; 1 red 10 2 white 15 3 green 2 1 red 15
;
run; /*I suppose algorithm to show observervations with identies 2 and 3 are mismatching (2 - colour, 3 - speed).*/ proc sql; create table join as select t1.*, t2.colour as colour_2, t2.speed as speed_2 from (select * from person1) t1 full join (select * from person2) t2 on t1.identity = t2.identity; quit; /*Data step is used to add reason field*/ data mismatch; format reason $50.; set join; if colour ne colour_2 then do; reson ='colour'; output; end; if speed ne speed_2 then do; reson ='speed'; output; end; run;

The issue is - I have 300 columns, so "t2.column as column_2" section in proc sql and "if column ne column_2" in data step are quite hude - is there any more simple solution?


And i have to full join two 300x1m tables, is there any way-around solution?

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
PROC COMPARE - test it on subsets first to ensure you can get the output you need.

View solution in original post

3 REPLIES 3
Reeza
Super User
PROC COMPARE - test it on subsets first to ensure you can get the output you need.
i_Van
Fluorite | Level 6

Thank you greatly, but PROC COMPARE is not the very thing I need (or more likely I'm not good at reading help):

 

proc compare base=person1 compare=person2 out=compare; 
id identity; 
run;

would give me the table containing not only differing values but both equal values, and having 300 columns it would be hard to find the differing "XXXXX" for chars and differing value for numeric.


2019-07-23 13_03_43-SAS Enterprise Guide.png

More over the values that present only in one table would not be indicated in the table - only the overall mismatching amount in output like

Number of Observations in Common: 4.
Number of Observations in WORK.B but not in WORK.A: 1.
Total Number of Observations Read from WORK.A: 4.
Total Number of Observations Read from WORK.B: 5. 

while proc sql

data mismatch;
format reason $50.;
set join;
reason = '';
if colour ne colour_2 or speed ne speed_2;
if colour ne colour_2 then  reason = cat('colour',' ', reason); 
if speed ne speed_2 then reason =cat('speed',' ', reason); 
run;

would filter out only mismatching lines, highlighting mismatching variables and showing lines that exist only in one table:

 

2019-07-23 13_04_07-SAS Enterprise Guide.png

Reeza
Super User
I thought it did show where lines do differ? Did you check the options available to control the output on the PROC COMPARE statement?

It's most definitely not perfect, but I think in your case, it does do what you asked.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1079 views
  • 0 likes
  • 2 in conversation