Solved: Re: Using Coalesce for join in proc sql

GN0001 · Posted 01-27-2022 05:01 PM

data test1;
input  code $15.;
datalines;
a
b
c
c
;

data test2;
input code1 $20. code2 $20. code3.;
datalines;
a b c
b a
c
e    k
;
run;

Proc sql;
create table my table as
select .*
from test1 a inner join test2 b on coalesce (b.code1, b.code2, b.code3);
Quit;

Hello team,

I want to connect these two datasets:

First issue is: second dataset is not built properly.

Second issue is: I need to join first dataset to second dataset, if the code1 is not a match for code, then we need to look at code2 and so on.

Then I need to add a field to call it myvalues, if a match is found, myvalue should be flagged 1.

Thanks,

Blue & Blue

Blue Blue

Tom · Posted 01-27-2022 05:24 PM

How do you want to connect the two datasets? What is the criteria?

If you want to test if CODE matches any of the CODE1 to CODE3 values then use WHICHC().

First let's make some workable example data steps.

data test1;
  a_row+1;
  input code $15.;
datalines;
a
b
c
c
;

data test2;
  b_row+1;
  input (code1-code3) (:$20.);
datalines;
a b c
. b a
c . .
e . k
;

Now let's make a join using the WHICHC() function. It will return which of the three codes were match (1, 2 or 3) and it will set and 0 if not found. SAS will read 0 as FALSE and any other number as TRUE.

proc sql;
create table want as 
  select *
  from test1 a 
  inner join test2 b
    on whichc(a.code,b.code1,b.code2,b.code3)
;
quit;

Results:

Obs    a_row    code    b_row    code1    code2    code3

 1       1       a        1        a        b        c
 2       1       a        2                 b        a
 3       2       b        1        a        b        c
 4       2       b        2                 b        a
 5       3       c        1        a        b        c
 6       3       c        3        c
 7       4       c        1        a        b        c
 8       4       c        3        c

View solution in original post

Tom · Posted 01-27-2022 05:24 PM

How do you want to connect the two datasets? What is the criteria?

If you want to test if CODE matches any of the CODE1 to CODE3 values then use WHICHC().

First let's make some workable example data steps.

data test1;
  a_row+1;
  input code $15.;
datalines;
a
b
c
c
;

data test2;
  b_row+1;
  input (code1-code3) (:$20.);
datalines;
a b c
. b a
c . .
e . k
;

Now let's make a join using the WHICHC() function. It will return which of the three codes were match (1, 2 or 3) and it will set and 0 if not found. SAS will read 0 as FALSE and any other number as TRUE.

proc sql;
create table want as 
  select *
  from test1 a 
  inner join test2 b
    on whichc(a.code,b.code1,b.code2,b.code3)
;
quit;

Results:

Obs    a_row    code    b_row    code1    code2    code3

 1       1       a        1        a        b        c
 2       1       a        2                 b        a
 3       2       b        1        a        b        c
 4       2       b        2                 b        a
 5       3       c        1        a        b        c
 6       3       c        3        c
 7       4       c        1        a        b        c
 8       4       c        3        c

GN0001 · Posted 01-28-2022 04:24 PM

Hello Tom,
I think that is. I need to try it first. I need to see how it works on a big data set. Will it increase the number of rows?
Regards,
blue & blue

Blue Blue

Tom · Posted 01-28-2022 05:07 PM

It will potentially blow up the size of the dataset if you use that method.

Basically it is performing a cartesian product. So if the main dataset has 10,000 observations and there 10 codes you are looking to find in the other dataset then the result is going to have 10 * 10,000 = 100,000 observations.

GN0001 · Posted 02-03-2022 02:25 PM

Hello,

Yes, I agree with you, coalesce takes a long time to give me results.

my first dataset has 50 observations and the other dataset had 5 million rows. It took a long time and the result was so big that I didn't know what to do.

Thanks for your help.

Regards,

blue & blue

Blue Blue

GN0001 · Posted 02-03-2022 02:07 PM

it worked for my case. Although, I need to test the result and compare it to see how it goes.

Blue Blue

GN0001 · Posted 02-03-2022 02:31 PM

Hello team,

can we convert the whichc to

a couple of joins with OR.

like on (a.code1=b.code2) or (a.code1=b.code3) or (a.code1=b.code4).

?

Please shed some light on.

Regards,

blue & blue

Blue Blue

Tom · Posted 02-03-2022 02:59 PM

@GN0001 wrote:

Hello team,

can we convert the whichc to

a couple of joins with OR.

like on (a.code1=b.code2) or (a.code1=b.code3) or (a.code1=b.code4).

?

Please shed some light on.

Regards,

blue & blue

Yes.

The WHICHC() function is much more useful in data step code where you could use a variable list.

For example:

whichc(code, of code1-code50)

But PROC SQL does not support variable list syntax.

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away