BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
GN0001
Barite | Level 11
data test1;
input  code $15.;
datalines;
a
b
c
c
;

data test2;
input code1 $20. code2 $20. code3.;
datalines;
a b c
b a
c
e    k
;
run;

Proc sql;
create table my table as
select .*
from test1 a inner join test2 b on coalesce (b.code1, b.code2, b.code3);
Quit;

Hello team,

I want to connect these two datasets:

First issue is: second dataset is not built properly.

Second issue is: I need to join first dataset to second dataset, if the code1 is not a match for code, then we need to look at code2 and so on.

Then I need to add a field to call it myvalues, if a match is found, myvalue should be flagged 1.

Thanks,

Blue & Blue

Blue Blue
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

How do you want to connect the two datasets?  What is the criteria?

If you want to test if CODE matches any of the CODE1 to CODE3 values then use WHICHC().

First let's make some workable example data steps.

data test1;
  a_row+1;
  input code $15.;
datalines;
a
b
c
c
;

data test2;
  b_row+1;
  input (code1-code3) (:$20.);
datalines;
a b c
. b a
c . .
e . k
;

Now let's make a join using the WHICHC() function. It will return which of the three codes were match (1, 2 or 3) and it will set and 0 if not found.  SAS will read 0 as FALSE and any other number as TRUE.

proc sql;
create table want as 
  select *
  from test1 a 
  inner join test2 b
    on whichc(a.code,b.code1,b.code2,b.code3)
;
quit;

Results:

Obs    a_row    code    b_row    code1    code2    code3

 1       1       a        1        a        b        c
 2       1       a        2                 b        a
 3       2       b        1        a        b        c
 4       2       b        2                 b        a
 5       3       c        1        a        b        c
 6       3       c        3        c
 7       4       c        1        a        b        c
 8       4       c        3        c

View solution in original post

7 REPLIES 7
Tom
Super User Tom
Super User

How do you want to connect the two datasets?  What is the criteria?

If you want to test if CODE matches any of the CODE1 to CODE3 values then use WHICHC().

First let's make some workable example data steps.

data test1;
  a_row+1;
  input code $15.;
datalines;
a
b
c
c
;

data test2;
  b_row+1;
  input (code1-code3) (:$20.);
datalines;
a b c
. b a
c . .
e . k
;

Now let's make a join using the WHICHC() function. It will return which of the three codes were match (1, 2 or 3) and it will set and 0 if not found.  SAS will read 0 as FALSE and any other number as TRUE.

proc sql;
create table want as 
  select *
  from test1 a 
  inner join test2 b
    on whichc(a.code,b.code1,b.code2,b.code3)
;
quit;

Results:

Obs    a_row    code    b_row    code1    code2    code3

 1       1       a        1        a        b        c
 2       1       a        2                 b        a
 3       2       b        1        a        b        c
 4       2       b        2                 b        a
 5       3       c        1        a        b        c
 6       3       c        3        c
 7       4       c        1        a        b        c
 8       4       c        3        c
GN0001
Barite | Level 11
Hello Tom,
I think that is. I need to try it first. I need to see how it works on a big data set. Will it increase the number of rows?
Regards,
blue & blue
Blue Blue
Tom
Super User Tom
Super User

It will potentially blow up the size of the dataset if you use that method.

Basically it is performing a cartesian product.  So if the main dataset has 10,000 observations and there 10 codes you are looking  to find in the other dataset then the result is going to have 10 * 10,000 = 100,000 observations.

GN0001
Barite | Level 11

Hello,

Yes, I agree with you, coalesce takes a long time to give me results.

my first dataset has 50 observations and the other dataset had 5 million rows. It took a long time and the result was so big that I didn't know what to do. 

Thanks for your help.

Regards,

blue & blue

Blue Blue
GN0001
Barite | Level 11
it worked for my case. Although, I need to test the result and compare it to see how it goes.
Blue Blue
GN0001
Barite | Level 11

Hello team,

can we convert the whichc to

a couple of joins with OR.

like on (a.code1=b.code2) or (a.code1=b.code3) or (a.code1=b.code4).

?

Please shed some light on.

Regards,

blue & blue

 

 

Blue Blue
Tom
Super User Tom
Super User

@GN0001 wrote:

Hello team,

can we convert the whichc to

a couple of joins with OR.

like on (a.code1=b.code2) or (a.code1=b.code3) or (a.code1=b.code4).

?

Please shed some light on.

Regards,

blue & blue

 

 


Yes. 

The WHICHC() function is much more useful in data step code where you could use a variable list.

For example:

whichc(code, of code1-code50)

But PROC SQL does not support variable list syntax.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1101 views
  • 3 likes
  • 2 in conversation