Proc sql; coalescing IDs and flagging source data, still leading to du...

rdum96 · Posted 11-10-2021 11:55 AM

Hi,

So I'm merging three sources and trying to stack the unique IDs and also flag which table(s) they were found in. However it's leading to duplicate IDs and flags on separate rows.

Input:

Table A Table B Table C

ID ID ID

1 2 2

2 4 3

3 9 8

4

Expected output:

ID in_A in_B in_C

1 1 0 0

2 1 1 1

3 1 0 1

4 1 1 0

8 0 0 1

9 0 1 0

Instead I am getting the following output:

ID in_A in_B in_C

1 1 0 0

2 1 0 0

2 0 1 0

2 0 0 1

3 1 0 0

3 0 0 1

4 1 0 0

4 0 1 0

8 0 0 1

9 0 1 0

proc sql;
   create table all_ids as select
   distinct coalesce(a.ID,b.ID,c.ID) as all_ID, 
   (case when calculated all_ID=a.ID then 1 else 0 end) as in_A,
   (case when calculated all_ID=b.ID then 1 else 0 end) as in_B,
   (case when calculated all_ID=c.ID then 1 else 0 end) as in_C
   from table_A as a full join table_B as b on a.ID=b.ID full join table_C as c on a.ID=c.ID;
quit;

rdum96 · Posted 11-10-2021 12:03 PM

Edit: I need to do this through proc sql and not data steps!

ballardw · Posted 11-10-2021 12:53 PM

It is extremely likely that you are seeing the result of a previous run of code as when I run your code the sql shows this in the Log:

91   proc sql;
92      create table all_ids as select
93      distinct coalesce(a.ID,b.ID,c.ID) as all_ID,
94      (case when calculated all_ID=a.ID then 1 else 0 end) as in_A,
95      (case when calculated all_ID=b.ID then 1 else 0 end) as in_B,
96      (case when calculated all_ID=c.ID then 1 else 0 end) as in_C
97      from table_A as a full join table B as b on a.ID=b.ID full join table as c on a.ID=c.ID;
                                            --
                                            73
                                            201
ERROR 73-322: Expecting an ON.

ERROR 201-322: The option is not recognized and will be ignored.

98   quit;

Which means if you have a data set named All_ids it was created in a previous step as the SQL you show has an error.

Why is there a requirement to use SQL? Does the "requirement" require a single SQL select? Other rules not stated?

Note that the DISTINCT applies to ALL values in the select clause. So you are going to get each combination of the 1/0 values to appear

rdum96 · Posted 11-10-2021 01:05 PM

Ugh, I didn't name my tables correctly in the first post. I've edited now, table b is table_b and table is table_c. I'm not getting the same errors in my log.

ballardw · Posted 11-10-2021 01:53 PM

Still think there may be something on your end.

I run this, where I have taken the time to actually provide data sets:

data table_A ;
  input ID;
datalines;
1                
2                
3                
4
;

data table_B ;  
  input  ID   ;
datalines;
2   
4   
9  
; 

data table_C;
  input ID;
datalines;
2
3
8
;

proc sql;
   create table all_ids as select
   distinct coalesce(a.ID,b.ID,c.ID) as all_ID, 
   (case when calculated all_ID=a.ID then 1 else 0 end) as in_A,
   (case when calculated all_ID=b.ID then 1 else 0 end) as in_B,
   (case when calculated all_ID=c.ID then 1 else 0 end) as in_C
   from table_A as a full join table_B as b on a.ID=b.ID full join table_c as c on a.ID=c.ID;
quit;

And get this result:

Obs    all_ID    in_A    in_B    in_C

 1        1        1       0       0
 2        2        1       1       1
 3        3        1       0       1
 4        4        1       1       0
 5        8        0       0       1
 6        9        0       1       0

If you aren't then one (or possibly more) of your data sets is not as you present it.

Proc sql; coalescing IDs and flagging source data, still leading to duplicates

Re: Proc sql; coalescing IDs and flagging source data, still leading to duplicates

Re: Proc sql; coalescing IDs and flagging source data, still leading to duplicates

Re: Proc sql; coalescing IDs and flagging source data, still leading to duplicates

Re: Proc sql; coalescing IDs and flagging source data, still leading to duplicates

Proc sql; coalescing IDs and flagging source data, still leading to duplicates

Re: Proc sql; coalescing IDs and flagging source data, still leading to duplicates

Re: Proc sql; coalescing IDs and flagging source data, still leading to duplicates

Re: Proc sql; coalescing IDs and flagging source data, still leading to duplicates

Re: Proc sql; coalescing IDs and flagging source data, still leading to duplicates

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away