Hi,
I would like to find the exact mismatch and missing values from the below Tables A and B by using Proc SQL procedure, Can anyone help me please?
Table A
Code | Test | Value |
1001 | ABCD | Good |
1001 | ABCD | Bad |
1001 | ABCD | Above average |
1001 | ABCD | Average |
Table B
Code | Test | Value1 |
1001 | ABCD | Bad |
1001 | ABCD | Good |
1001 | ABCD | |
1001 | ABCD | Below average |
Expected output:
Code | Test | Value | Value1 |
1001 | ABCD | Above average | Below average |
1001 | ABCD | Average |
Thanks in advance!
Thanks,
Abin
Here's the output for each of the four PROC SQL statements:
Table_A_Not_B |
||||
Obs |
Code |
Test |
Value |
seq_num |
1 |
1001 |
ABCD |
Above_average |
1 |
2 |
1001 |
ABCD |
Average |
2 |
|
||||
Table_B_Not_A |
||||
Obs |
Code |
Test |
Value1 |
seq_num |
1 |
1001 |
ABCD |
Below_average |
1 |
Want_Side_by_Side |
|||||
Obs |
Code |
Test |
Value |
Value1 |
seq_num |
1 |
1001 |
ABCD |
Above_average |
Below_average |
1 |
2 |
1001 |
ABCD |
Average |
|
2 |
Want_Stacked |
||||
Obs |
Code |
Test |
Value |
Row_Source |
1 |
1001 |
ABCD |
Above_average |
Table_A_Not_B |
2 |
1001 |
ABCD |
Average |
Table_A_Not_B |
3 |
1001 |
ABCD |
Below_average |
Table_B_Not_A |
How are we supposed to know that "Below Average" is the mismatch for "Above average" and not Average?
That this is NOT the desired result?
Code | Test | Value | Value1 |
1001 | ABCD | Above average | |
1001 | ABCD | Average | Below average |
You might want to provide a bit more data and possbily with more Code and Test values as currently we don't know if there are any and solutions if ALL of Code = 1001 and All of Test=ABCD may not work if there are others involved.
Thank you Ballardw!
Yes, its also possible!
But i don't want to see somehitng like
Code | Test | Value | Value1 |
1001 | ABCD | good | Bad |
1001 | ABCD | Bad | good |
I suggest to list the mismatches in the following way:
title "Unmatched values";
proc sql;
select "A" as from, a.Code, a.Test, a.Value
from a left join b
on a.code=b.code and a.test=b.test and a.value=b.value
where b.value is missing
union all
select "B" as from, b.Code, b.Test, b.Value
from a right join b
on a.code=b.code and a.test=b.test and a.value=b.value
where a.value is missing
order by code, test, from
;
quit;
This could be done in fewer steps, but I used multiple PROC SQL statements to show what is done at each step to arrive at the final results.
[Edited: you may want to change "EXCEPT ALL" to "EXCEPT". If your data contains multiple rows with identical values (e.g. three observations in Table_A that all have "1001 / ABCD / Good"), "EXCEPT" should give you better results.]
DATA Table_A;
LENGTH Code 8 Test $10 Value $20;
INPUT Code Test $ Value $;
DATALINES;
1001 ABCD Good
1001 ABCD Bad
1001 ABCD Above_average
1001 ABCD Average
;
RUN;
DATA Table_B;
LENGTH Code 8 Test $10 Value1 $20;
INPUT Code Test $ Value1 $;
DATALINES;
1001 ABCD Bad
1001 ABCD Good
1001 ABCD .
1001 ABCD Below_average
;
RUN;
PROC SQL;
CREATE TABLE Table_A_Not_B AS
SELECT Qry.*
, monotonic() AS seq_num
FROM (SELECT * FROM Table_A(WHERE=(Value IS NOT NULL))
/* Change "EXCEPT ALL" to "EXCEPT" */
EXCEPT
SELECT * FROM Table_B(WHERE=(Value1 IS NOT NULL))) AS Qry;
QUIT;
PROC SQL;
CREATE TABLE Table_B_Not_A AS
SELECT Qry.*
, monotonic() AS seq_num
FROM (SELECT * FROM Table_B(WHERE=(Value1 IS NOT NULL))
/* Change "EXCEPT ALL" to "EXCEPT" */
EXCEPT
SELECT * FROM Table_A(WHERE=(Value IS NOT NULL))) AS Qry;
QUIT;
/* this would give you the results exactly the way you want (side-by-side) */
PROC SQL;
CREATE TABLE Want_Side_by_Side AS
SELECT COALESCE(T1.Code, T2.Code) AS Code
, COALESCE(T1.Test, T2.Test) AS Test
, T1.Value
, T2.Value1
, COALESCE(T1.seq_num, T2.seq_num) AS seq_num
FROM Table_A_Not_B AS T1
FULL OUTER JOIN Table_B_Not_A AS T2
ON (T1.Code = T2.Code
AND T1.Test = T2.Test
AND T1.seq_num = T2.seq_num);
QUIT;
/* alternate version: this would give you the results vertically (stacked);
this approach would be more traditional in SQL because it avoids assigning
and match/mismatch preference (Ballardw had pointed this out above) */
PROC SQL;
CREATE TABLE Want_Stacked(drop=seq_num) AS
SELECT T1.*, 'Table_A_Not_B' AS Row_Source FROM Table_A_Not_B AS T1
UNION
SELECT T2.*, 'Table_B_Not_A' AS Row_Source FROM Table_B_Not_A AS T2;
QUIT;
Hope this helps.
-- hbi
Here's the output for each of the four PROC SQL statements:
Table_A_Not_B |
||||
Obs |
Code |
Test |
Value |
seq_num |
1 |
1001 |
ABCD |
Above_average |
1 |
2 |
1001 |
ABCD |
Average |
2 |
|
||||
Table_B_Not_A |
||||
Obs |
Code |
Test |
Value1 |
seq_num |
1 |
1001 |
ABCD |
Below_average |
1 |
Want_Side_by_Side |
|||||
Obs |
Code |
Test |
Value |
Value1 |
seq_num |
1 |
1001 |
ABCD |
Above_average |
Below_average |
1 |
2 |
1001 |
ABCD |
Average |
|
2 |
Want_Stacked |
||||
Obs |
Code |
Test |
Value |
Row_Source |
1 |
1001 |
ABCD |
Above_average |
Table_A_Not_B |
2 |
1001 |
ABCD |
Average |
Table_A_Not_B |
3 |
1001 |
ABCD |
Below_average |
Table_B_Not_A |
proc sql;
create table want as
select distinct a.code,a.test,a.value,b.value1
from table_a a left join
table_b b on
a.code = a.code and
b.test = b.test
where value not in (select b.value1 from table_b b) and
value1 not in (select a.value from table_a a);
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.