Solved: Re: Merging tables by joining a column of one table with several colum...

ilikesas · Posted 12-06-2014 02:47 PM

Hi,

suppose I have File1:

Name
A
B
C

and File2:

name1	name2	name3
A	ss	sss
dd	B	ddd
kk	kkk	C

I would like to merge by names, but in the second file the names that I need are dispersed in several columns.

For example, if I merge by joining column Name from File1 with column name1 from File2, I will get a merge for entry A, but not for B and C because there are no B and C in the column name1 in File2.

So how is it possible to make such that the merge is looking at the 3 name columns in File2 to find a corresponding value from the column Name in File1?

Thank you

Reeza · Posted 12-06-2014 03:33 PM

Depending on the size of tables you have two ways that I can think of:

1. Join on all 3 columns:

proc sql;

create table table1 as

select a.*, b.*

from file1 as a

join file2 as b

on a.name=b.name1

or a.name=b.name2

or a.name=b.name3;

quit;

2. Cross join and check for a match using the where clause. This is a brute force method and if your tables are large it's painful:

proc sql;

create table table2 as

select a.*, b.*

from file1 as a

cross join file2 as b

where whichc(name, name1, name2, name3)>0;

quit;

View solution in original post

Reeza · Posted 12-06-2014 03:33 PM

Depending on the size of tables you have two ways that I can think of:

1. Join on all 3 columns:

proc sql;

create table table1 as

select a.*, b.*

from file1 as a

join file2 as b

on a.name=b.name1

or a.name=b.name2

or a.name=b.name3;

quit;

2. Cross join and check for a match using the where clause. This is a brute force method and if your tables are large it's painful:

proc sql;

create table table2 as

select a.*, b.*

from file1 as a

cross join file2 as b

where whichc(name, name1, name2, name3)>0;

quit;

ilikesas · Posted 12-06-2014 06:39 PM

Hi Reeza,

did both of your codes and they worked on the small example that I made in the beginning, thanks!

I was just wondering, is it possible to drop the name1 - name3 at the end of the proc sql, because I was trying and for some reason it didn't work so what I had to do was

data table1 (drop = name1 name2 name3);

set table1;

run;

I guess that it doesn't change much but just from a point of elegance that I would like to know how to do it.

Thank you

Reeza · Posted 12-07-2014 12:48 AM

Drop the b.* from the SQL code.

What's the overall goal of your merge then if your not bringing in fields?

Tom · Posted 12-07-2014 01:12 AM

You can use the DROP= dataset option in the original CREATE TABLE statement.

create table table1 (DROP=NAME1 NAME2 NAME3) as

select .....

PGStats · Posted 12-06-2014 05:09 PM

What happens if File2 is:

name1 name2 name3

A ss sss

dd B ddd

A kkk C

PG

ilikesas · Posted 12-06-2014 05:49 PM

for now I assume that the values are unique, but this is a good point which I also thought of, and I guess that in such a case I would need additional join dimensions

stat_sas · Posted 12-06-2014 06:44 PM

data file1;

input Name $;

datalines;

A

B

C

;

data File2;

input (name1 name2 name3)($);

datalines;

A ss sss

dd B ddd

kk kkk C

;

data file3;

set file2;

length name $1.;

array nm{*} name1-name3;

name=nm(_n_);

run;

data want;

merge file1 file3;

by name;

run;

PGStats · Posted 12-06-2014 09:23 PM

Another possibility which may (or not) be optimized better than ORed conditions by proc SQL :

proc sql;

create table file3 as

select file1.*, file2.*, 1 as match

from file1 inner join file2 on file1.name=file2.name1

union all

select file1.*, file2.*, 2

from file1 inner join file2 on file1.name=file2.name2

union all

select file1.*, file2.*, 3

from file1 inner join file2 on file1.name=file2.name3;

select * from file3;

quit;

PG

jakarman · Posted 12-07-2014 05:31 AM

You said it worked on the small tables being processed. That is great the logical solution works.

But how big are you real tables? This kind of construction will work well when all data as a Cartesian product fits (almost) in memory. The performance will dramatically reduce when it goes into IO processing.

---->-- ja karman --<-----

ilikesas · Posted 12-07-2014 10:54 AM

Hi Jaap,

my tables are at most 300 000 lines each, so I guess this isn't big according to SAS standsrds?

Thank you

Reeza · Posted 12-07-2014 04:47 PM

If you're not actually bringing in anything from the second table, I'd transpose it and then use a different SQL query.

data file1;

input Name $;

datalines;

A

B

C

;

data File2;

input id (name1 name2 name3)($);

datalines;

1 A ss sss

2 dd B ddd

3 kk kkk C

;

run;

proc transpose data=file2 out=file3 prefix=Var;

by id;

var name1-name3;

run;

proc sql;

create table table1 as

select a.*

from file1 as a

where name in (select var1 from file3);

quit;

jakarman · Posted 12-07-2014 03:35 PM

No not very 300k records with a 1K sizing is not very big anymore as being a 300Mb dataset.

Although having a dataset with 300Mb and doing a Cartesian product with a 1Mb dataset (1Kb records 1Kb sized) is going to be an internal 600Gb dataset.

Every originally record will e joined with all 1000 records and the size doubling for all records. 1kb+1Kb. 600Gb is still big these days for your hardware. That is the effect of that Cartesian product as brute force join. Just do some math whether it is acceptable for you.

---->-- ja karman --<-----

Merging tables by joining a column of one table with several columns of the other table

Re: Merging tables by joining a column of one table with several columns of the other table

Re: Merging tables by joining a column of one table with several columns of the other table

Re: Merging tables by joining a column of one table with several columns of the other table

Re: Merging tables by joining a column of one table with several columns of the other table

Re: Merging tables by joining a column of one table with several columns of the other table

Re: Merging tables by joining a column of one table with several columns of the other table

Re: Merging tables by joining a column of one table with several columns of the other table

Re: Merging tables by joining a column of one table with several columns of the other table

Re: Merging tables by joining a column of one table with several columns of the other table

Re: Merging tables by joining a column of one table with several columns of the other table

Re: Merging tables by joining a column of one table with several columns of the other table

Re: Merging tables by joining a column of one table with several columns of the other table

Re: Merging tables by joining a column of one table with several columns of the other table

Registration is open