Hi,
I would like to find the exact mismatch and missing values from the below Tables A and B by using Proc SQL procedure, Can anyone help me please?
Table A
Code | Test | Value |
1001 | ABCD | Good |
1001 | ABCD | Bad |
1001 | ABCD | Above average |
1001 | ABCD | Average |
Table B
Code | Test | Value1 |
1001 | ABCD | Bad |
1001 | ABCD | Good |
1001 | ABCD | |
1001 | ABCD | Below average |
Expected output:
Code | Test | Value | Value1 |
1001 | ABCD | Above average | Below average |
1001 | ABCD | Average |
Thanks in advance!
Thanks,
Abin
Here's the output for each of the four PROC SQL statements:
Table_A_Not_B |
||||
Obs |
Code |
Test |
Value |
seq_num |
1 |
1001 |
ABCD |
Above_average |
1 |
2 |
1001 |
ABCD |
Average |
2 |
|
||||
Table_B_Not_A |
||||
Obs |
Code |
Test |
Value1 |
seq_num |
1 |
1001 |
ABCD |
Below_average |
1 |
Want_Side_by_Side |
|||||
Obs |
Code |
Test |
Value |
Value1 |
seq_num |
1 |
1001 |
ABCD |
Above_average |
Below_average |
1 |
2 |
1001 |
ABCD |
Average |
|
2 |
Want_Stacked |
||||
Obs |
Code |
Test |
Value |
Row_Source |
1 |
1001 |
ABCD |
Above_average |
Table_A_Not_B |
2 |
1001 |
ABCD |
Average |
Table_A_Not_B |
3 |
1001 |
ABCD |
Below_average |
Table_B_Not_A |
How are we supposed to know that "Below Average" is the mismatch for "Above average" and not Average?
That this is NOT the desired result?
Code | Test | Value | Value1 |
1001 | ABCD | Above average | |
1001 | ABCD | Average | Below average |
You might want to provide a bit more data and possbily with more Code and Test values as currently we don't know if there are any and solutions if ALL of Code = 1001 and All of Test=ABCD may not work if there are others involved.
Thank you Ballardw!
Yes, its also possible!
But i don't want to see somehitng like
Code | Test | Value | Value1 |
1001 | ABCD | good | Bad |
1001 | ABCD | Bad | good |
I suggest to list the mismatches in the following way:
title "Unmatched values";
proc sql;
select "A" as from, a.Code, a.Test, a.Value
from a left join b
on a.code=b.code and a.test=b.test and a.value=b.value
where b.value is missing
union all
select "B" as from, b.Code, b.Test, b.Value
from a right join b
on a.code=b.code and a.test=b.test and a.value=b.value
where a.value is missing
order by code, test, from
;
quit;
This could be done in fewer steps, but I used multiple PROC SQL statements to show what is done at each step to arrive at the final results.
[Edited: you may want to change "EXCEPT ALL" to "EXCEPT". If your data contains multiple rows with identical values (e.g. three observations in Table_A that all have "1001 / ABCD / Good"), "EXCEPT" should give you better results.]
DATA Table_A;
LENGTH Code 8 Test $10 Value $20;
INPUT Code Test $ Value $;
DATALINES;
1001 ABCD Good
1001 ABCD Bad
1001 ABCD Above_average
1001 ABCD Average
;
RUN;
DATA Table_B;
LENGTH Code 8 Test $10 Value1 $20;
INPUT Code Test $ Value1 $;
DATALINES;
1001 ABCD Bad
1001 ABCD Good
1001 ABCD .
1001 ABCD Below_average
;
RUN;
PROC SQL;
CREATE TABLE Table_A_Not_B AS
SELECT Qry.*
, monotonic() AS seq_num
FROM (SELECT * FROM Table_A(WHERE=(Value IS NOT NULL))
/* Change "EXCEPT ALL" to "EXCEPT" */
EXCEPT
SELECT * FROM Table_B(WHERE=(Value1 IS NOT NULL))) AS Qry;
QUIT;
PROC SQL;
CREATE TABLE Table_B_Not_A AS
SELECT Qry.*
, monotonic() AS seq_num
FROM (SELECT * FROM Table_B(WHERE=(Value1 IS NOT NULL))
/* Change "EXCEPT ALL" to "EXCEPT" */
EXCEPT
SELECT * FROM Table_A(WHERE=(Value IS NOT NULL))) AS Qry;
QUIT;
/* this would give you the results exactly the way you want (side-by-side) */
PROC SQL;
CREATE TABLE Want_Side_by_Side AS
SELECT COALESCE(T1.Code, T2.Code) AS Code
, COALESCE(T1.Test, T2.Test) AS Test
, T1.Value
, T2.Value1
, COALESCE(T1.seq_num, T2.seq_num) AS seq_num
FROM Table_A_Not_B AS T1
FULL OUTER JOIN Table_B_Not_A AS T2
ON (T1.Code = T2.Code
AND T1.Test = T2.Test
AND T1.seq_num = T2.seq_num);
QUIT;
/* alternate version: this would give you the results vertically (stacked);
this approach would be more traditional in SQL because it avoids assigning
and match/mismatch preference (Ballardw had pointed this out above) */
PROC SQL;
CREATE TABLE Want_Stacked(drop=seq_num) AS
SELECT T1.*, 'Table_A_Not_B' AS Row_Source FROM Table_A_Not_B AS T1
UNION
SELECT T2.*, 'Table_B_Not_A' AS Row_Source FROM Table_B_Not_A AS T2;
QUIT;
Hope this helps.
-- hbi
Here's the output for each of the four PROC SQL statements:
Table_A_Not_B |
||||
Obs |
Code |
Test |
Value |
seq_num |
1 |
1001 |
ABCD |
Above_average |
1 |
2 |
1001 |
ABCD |
Average |
2 |
|
||||
Table_B_Not_A |
||||
Obs |
Code |
Test |
Value1 |
seq_num |
1 |
1001 |
ABCD |
Below_average |
1 |
Want_Side_by_Side |
|||||
Obs |
Code |
Test |
Value |
Value1 |
seq_num |
1 |
1001 |
ABCD |
Above_average |
Below_average |
1 |
2 |
1001 |
ABCD |
Average |
|
2 |
Want_Stacked |
||||
Obs |
Code |
Test |
Value |
Row_Source |
1 |
1001 |
ABCD |
Above_average |
Table_A_Not_B |
2 |
1001 |
ABCD |
Average |
Table_A_Not_B |
3 |
1001 |
ABCD |
Below_average |
Table_B_Not_A |
proc sql;
create table want as
select distinct a.code,a.test,a.value,b.value1
from table_a a left join
table_b b on
a.code = a.code and
b.test = b.test
where value not in (select b.value1 from table_b b) and
value1 not in (select a.value from table_a a);
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.