SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Merging tables by joining a column of one table with several columns of the other table

Accepted Solution Solved
Reply
Super Contributor
Posts: 441
Accepted Solution

Merging tables by joining a column of one table with several columns of the other table

Hi,

suppose I have File1:

Name
A
B
C

and File2:

name1name2name3
Asssss
ddBddd
kkkkkC

I would like to merge by names, but in the second file the names that I need are dispersed in several columns.

For example, if I merge by joining column Name from File1 with column name1 from File2, I will get a merge for entry A, but not for B and C because there are no B and C in the column name1 in File2.

So how is it possible to make such that the merge is looking at the 3 name columns in File2 to find a corresponding value from the column Name in File1?

Thank you


Accepted Solutions
Solution
‎12-06-2014 03:33 PM
Super User
Posts: 19,877

Re: Merging tables by joining a column of one table with several columns of the other table

Depending on the size of tables you have two ways that I can think of:

1. Join on all 3 columns:

proc sql;

create table table1 as

select a.*, b.*

from file1 as a

join file2 as b

on a.name=b.name1

or a.name=b.name2

or a.name=b.name3;

quit;

2. Cross join and check for a match using the where clause. This is a brute force method and if your tables are large it's painful:

proc sql;

create table table2 as

select a.*, b.*

from file1 as a

cross join file2 as b

where whichc(name, name1, name2, name3)>0;

quit;

View solution in original post


All Replies
Solution
‎12-06-2014 03:33 PM
Super User
Posts: 19,877

Re: Merging tables by joining a column of one table with several columns of the other table

Depending on the size of tables you have two ways that I can think of:

1. Join on all 3 columns:

proc sql;

create table table1 as

select a.*, b.*

from file1 as a

join file2 as b

on a.name=b.name1

or a.name=b.name2

or a.name=b.name3;

quit;

2. Cross join and check for a match using the where clause. This is a brute force method and if your tables are large it's painful:

proc sql;

create table table2 as

select a.*, b.*

from file1 as a

cross join file2 as b

where whichc(name, name1, name2, name3)>0;

quit;

Super Contributor
Posts: 441

Re: Merging tables by joining a column of one table with several columns of the other table

Hi Reeza,

did both of your codes and they worked on the small example that I made in the beginning, thanks!

I was just wondering, is it possible to drop the name1 - name3 at the end of the proc sql, because I was trying and for some reason it didn't work so what I had to do was 

data table1 (drop = name1 name2 name3);

set table1;

run;

I guess that it doesn't change much but just from a point of elegance that I would like to know how to do it.

Thank you

Super User
Posts: 19,877

Re: Merging tables by joining a column of one table with several columns of the other table

Drop the b.* from the SQL code.

What's the overall goal of your merge then if your not bringing in fields?

Super User
Super User
Posts: 7,079

Re: Merging tables by joining a column of one table with several columns of the other table

You can use the DROP= dataset option in the original CREATE TABLE statement.

create table table1 (DROP=NAME1 NAME2 NAME3) as

select .....

Respected Advisor
Posts: 4,935

Re: Merging tables by joining a column of one table with several columns of the other table

What happens if File2 is:

name1     name2      name3

A              ss             sss

dd            B              ddd

A              kkk          C

PG

PG
Super Contributor
Posts: 441

Re: Merging tables by joining a column of one table with several columns of the other table

for now I assume that the values are unique, but this is a good point which I also thought of, and I guess that in such a case I would need additional join dimensions

Trusted Advisor
Posts: 1,231

Re: Merging tables by joining a column of one table with several columns of the other table

data file1;

input Name $;

datalines;

A

B

C

;


data File2;

input (name1 name2 name3)($);

datalines;

A ss sss

dd B ddd

kk kkk C

;

data file3;

set file2;

length name $1.;

array nm{*} name1-name3;

name=nm(_n_);

run;

data want;

merge file1 file3;

by name;

run;

Respected Advisor
Posts: 4,935

Re: Merging tables by joining a column of one table with several columns of the other table

Another possibility which may (or not) be optimized better than ORed conditions by proc SQL :

proc sql;

create table file3 as

select file1.*, file2.*, 1 as match

from file1 inner join file2 on file1.name=file2.name1

union all

select file1.*, file2.*, 2

from file1 inner join file2 on file1.name=file2.name2

union all

select file1.*, file2.*, 3

from file1 inner join file2 on file1.name=file2.name3;

select * from file3;

quit;

PG

PG
Trusted Advisor
Posts: 3,215

Re: Merging tables by joining a column of one table with several columns of the other table

You said it worked on the small tables being processed. That is great the logical solution works.

But how big are you real tables? This kind of construction will work well when all data as a Cartesian product fits (almost) in memory. The performance will dramatically reduce when it goes into IO processing.

---->-- ja karman --<-----
Super Contributor
Posts: 441

Re: Merging tables by joining a column of one table with several columns of the other table

Hi Jaap,

my tables are at most 300 000 lines each, so I guess this isn't big according to SAS standsrds?

Thank you

Super User
Posts: 19,877

Re: Merging tables by joining a column of one table with several columns of the other table

If you're not actually bringing in anything from the second table, I'd transpose it and then use a different SQL query.

data file1;

input Name $;

datalines;

A

B

C

;

data File2;

input id (name1 name2 name3)($);

datalines;

1 A ss sss

2 dd B ddd

3 kk kkk C

;

run;

proc transpose data=file2 out=file3 prefix=Var;

by id;

var name1-name3;

run;

proc sql;

  create table table1 as

  select a.*

  from file1 as a

  where name in (select var1 from file3);

quit;

Trusted Advisor
Posts: 3,215

Re: Merging tables by joining a column of one table with several columns of the other table

No not very 300k records with a 1K sizing is not very big anymore as being a 300Mb dataset.

Although having a dataset with 300Mb and doing a Cartesian product with a 1Mb dataset (1Kb records 1Kb sized) is going to be an internal 600Gb dataset.

Every originally record will e joined with all 1000 records and the size doubling for all records. 1kb+1Kb. 600Gb is still big these days for your hardware. That is the effect of that Cartesian product as brute force join.  Just do some math whether it is acceptable for you. 
 

---->-- ja karman --<-----
🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 610 views
  • 1 like
  • 6 in conversation