data test1; input code $15.; datalines; a b c c ; data test2; input code1 $20. code2 $20. code3.; datalines; a b c b a c e k ; run; Proc sql; create table my table as select .* from test1 a inner join test2 b on coalesce (b.code1, b.code2, b.code3); Quit;
Hello team,
I want to connect these two datasets:
First issue is: second dataset is not built properly.
Second issue is: I need to join first dataset to second dataset, if the code1 is not a match for code, then we need to look at code2 and so on.
Then I need to add a field to call it myvalues, if a match is found, myvalue should be flagged 1.
Thanks,
Blue & Blue
How do you want to connect the two datasets? What is the criteria?
If you want to test if CODE matches any of the CODE1 to CODE3 values then use WHICHC().
First let's make some workable example data steps.
data test1;
a_row+1;
input code $15.;
datalines;
a
b
c
c
;
data test2;
b_row+1;
input (code1-code3) (:$20.);
datalines;
a b c
. b a
c . .
e . k
;
Now let's make a join using the WHICHC() function. It will return which of the three codes were match (1, 2 or 3) and it will set and 0 if not found. SAS will read 0 as FALSE and any other number as TRUE.
proc sql;
create table want as
select *
from test1 a
inner join test2 b
on whichc(a.code,b.code1,b.code2,b.code3)
;
quit;
Results:
Obs a_row code b_row code1 code2 code3 1 1 a 1 a b c 2 1 a 2 b a 3 2 b 1 a b c 4 2 b 2 b a 5 3 c 1 a b c 6 3 c 3 c 7 4 c 1 a b c 8 4 c 3 c
How do you want to connect the two datasets? What is the criteria?
If you want to test if CODE matches any of the CODE1 to CODE3 values then use WHICHC().
First let's make some workable example data steps.
data test1;
a_row+1;
input code $15.;
datalines;
a
b
c
c
;
data test2;
b_row+1;
input (code1-code3) (:$20.);
datalines;
a b c
. b a
c . .
e . k
;
Now let's make a join using the WHICHC() function. It will return which of the three codes were match (1, 2 or 3) and it will set and 0 if not found. SAS will read 0 as FALSE and any other number as TRUE.
proc sql;
create table want as
select *
from test1 a
inner join test2 b
on whichc(a.code,b.code1,b.code2,b.code3)
;
quit;
Results:
Obs a_row code b_row code1 code2 code3 1 1 a 1 a b c 2 1 a 2 b a 3 2 b 1 a b c 4 2 b 2 b a 5 3 c 1 a b c 6 3 c 3 c 7 4 c 1 a b c 8 4 c 3 c
It will potentially blow up the size of the dataset if you use that method.
Basically it is performing a cartesian product. So if the main dataset has 10,000 observations and there 10 codes you are looking to find in the other dataset then the result is going to have 10 * 10,000 = 100,000 observations.
Hello,
Yes, I agree with you, coalesce takes a long time to give me results.
my first dataset has 50 observations and the other dataset had 5 million rows. It took a long time and the result was so big that I didn't know what to do.
Thanks for your help.
Regards,
blue & blue
Hello team,
can we convert the whichc to
a couple of joins with OR.
like on (a.code1=b.code2) or (a.code1=b.code3) or (a.code1=b.code4).
?
Please shed some light on.
Regards,
blue & blue
@GN0001 wrote:
Hello team,
can we convert the whichc to
a couple of joins with OR.
like on (a.code1=b.code2) or (a.code1=b.code3) or (a.code1=b.code4).
?
Please shed some light on.
Regards,
blue & blue
Yes.
The WHICHC() function is much more useful in data step code where you could use a variable list.
For example:
whichc(code, of code1-code50)
But PROC SQL does not support variable list syntax.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.