Hi,
So I'm merging three sources and trying to stack the unique IDs and also flag which table(s) they were found in. However it's leading to duplicate IDs and flags on separate rows.
Input:
Table A Table B Table C
ID ID ID
1 2 2
2 4 3
3 9 8
4
Expected output:
ID in_A in_B in_C
1 1 0 0
2 1 1 1
3 1 0 1
4 1 1 0
8 0 0 1
9 0 1 0
Instead I am getting the following output:
ID in_A in_B in_C
1 1 0 0
2 1 0 0
2 0 1 0
2 0 0 1
3 1 0 0
3 0 0 1
4 1 0 0
4 0 1 0
8 0 0 1
9 0 1 0
proc sql;
create table all_ids as select
distinct coalesce(a.ID,b.ID,c.ID) as all_ID,
(case when calculated all_ID=a.ID then 1 else 0 end) as in_A,
(case when calculated all_ID=b.ID then 1 else 0 end) as in_B,
(case when calculated all_ID=c.ID then 1 else 0 end) as in_C
from table_A as a full join table_B as b on a.ID=b.ID full join table_C as c on a.ID=c.ID;
quit;
It is extremely likely that you are seeing the result of a previous run of code as when I run your code the sql shows this in the Log:
91 proc sql; 92 create table all_ids as select 93 distinct coalesce(a.ID,b.ID,c.ID) as all_ID, 94 (case when calculated all_ID=a.ID then 1 else 0 end) as in_A, 95 (case when calculated all_ID=b.ID then 1 else 0 end) as in_B, 96 (case when calculated all_ID=c.ID then 1 else 0 end) as in_C 97 from table_A as a full join table B as b on a.ID=b.ID full join table as c on a.ID=c.ID; -- 73 201 ERROR 73-322: Expecting an ON. ERROR 201-322: The option is not recognized and will be ignored. 98 quit;
Which means if you have a data set named All_ids it was created in a previous step as the SQL you show has an error.
Why is there a requirement to use SQL? Does the "requirement" require a single SQL select? Other rules not stated?
Note that the DISTINCT applies to ALL values in the select clause. So you are going to get each combination of the 1/0 values to appear
Ugh, I didn't name my tables correctly in the first post. I've edited now, table b is table_b and table is table_c. I'm not getting the same errors in my log.
Still think there may be something on your end.
I run this, where I have taken the time to actually provide data sets:
data table_A ; input ID; datalines; 1 2 3 4 ; data table_B ; input ID ; datalines; 2 4 9 ; data table_C; input ID; datalines; 2 3 8 ; proc sql; create table all_ids as select distinct coalesce(a.ID,b.ID,c.ID) as all_ID, (case when calculated all_ID=a.ID then 1 else 0 end) as in_A, (case when calculated all_ID=b.ID then 1 else 0 end) as in_B, (case when calculated all_ID=c.ID then 1 else 0 end) as in_C from table_A as a full join table_B as b on a.ID=b.ID full join table_c as c on a.ID=c.ID; quit;
And get this result:
Obs all_ID in_A in_B in_C 1 1 1 0 0 2 2 1 1 1 3 3 1 0 1 4 4 1 1 0 5 8 0 0 1 6 9 0 1 0
If you aren't then one (or possibly more) of your data sets is not as you present it.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.