Solved: Re: Conditional Merge (or conditional join [SQL]) on two tables

nicnad · Posted 07-17-2012 12:17 PM

Hi,

I am new to SAS and SQL query so I am asking for your help to solve my problem.

I have two tables. Each tables have only alphanumerical data. The first table is missing data in some column. The other table is not missing any data. Lets says the tables are named Table1 and Table2.

They look like the following :

Table1

Name	Info1	Info2	Info3
John	a
Paul		bb	cc
George	aaa		ccc
Adam	aaaa	bbbb
Brian			cccc

Table2

Info1	Info2	Info3	Value1	Value2	Value3
a	b	c	ddd	e	f
aa	bb	cc	jj	kk	ll
aaa	bbb	ccc	u	i	z
aaaa	bbbb	cccc	nn	bbaq	vvvv

What I would like to do is do is to merge both table based on the info1 or info2 or info3 and return the name with value1, value2, value 3

I really don't know how to write this in SAS or SQL, but I am sure this could be done.

I see the solution as being nested IFs (or CASE) statement where the value1, value2 and value3 are returned if there is a matched based on the info from Table1.

IF info1 is not null; do the merge with info1 ELSE do the merge with info2

IF info2 is null ; do the merge info 3

The end result would be :

Name	Value1	Value2	Value3
John	ddd	e	f
Paul	jj	kk	ll
George	u	i	z
Adam	nn	bbaq	ww
Brian	nn	bbaq	ww

P.S. The data used is only there for example purpose, it the real data it is not sorted or have specific length in any way.

Could you please help me write the proper SAS procedure to create this table?

Thank you very much for your help.

Haikuo · Posted 07-17-2012 01:50 PM

Replace the input...cards part with a simple 'set' statement:

data h1;

set yourtable1;

length index $10.;

array ind info1-info3;

do over ind;

if not missing(ind) then do;

index=cats(vname(ind),'_',ind);

leave;

end;

;

data h2;

set yourtable2;

length index $50.;

array ind info1-info3;

do over ind;

index=cats(vname(ind),'_',ind);

output;

end;

;

Haikuo

View solution in original post

Haikuo · Posted 07-17-2012 01:12 PM

I know there are definitely slicker solutions out there. But if you need it quick, here is one approach hopefully to get you started. The main idea is to set up index for both data set, so they can merge accordingly.

data h1;

input (Name Info1 Info2 Info3) (:$10.);

length index $10.;

array ind info1-info3;

do over ind;

if not missing(ind) then do;

index=cats(vname(ind),'_',ind);

leave;

end;

cards;

John a . .

Paul . bb cc

George aaa . ccc

Adam aaaa bbbb .

Brian . . cccc

;

data h2;

input (Info1 Info2 Info3 Value1 Value2 Value3) (:$8.);

length index $50.;

array ind info1-info3;

do over ind;

index=cats(vname(ind),'_',ind);

output;

end;

cards;

a b c ddd e f

aa bb cc jj kk ll

aaa bbb ccc u i z

aaaa bbbb cccc nn bbaq vvvv

;

proc sql;

select name, value1, value2,value3 from h1 a, h2 b

where a.index=b.index;

quit;

Haikuo

nicnad · Posted 07-17-2012 01:42 PM

Thank you very much for the quick reply.

Like I said before, I am a complete beginner with SAS.

The thing is my two tables are already created and they both contain 2 Millons + rows and 20+ columns each . They are both SAS tables (.sas7bdat)

Is there a way to read the existing tables instead of using cards to declare the variables value (if I understand the code correctly)?

What I would like to do is the code to read the existing variables names (column headers) instead of having to input it myself, then having the program determine the length of the variable and then read each observations (rows) and assign it to an index.

Is this something that can be done in SAS language?

Would the solution be easier in another language? I am using SAS Enterprise Guide 4.1 so any language supported by it would be fine.

Again, thank you for your help and time.

Haikuo · Posted 07-17-2012 01:50 PM

Replace the input...cards part with a simple 'set' statement:

data h1;

set yourtable1;

length index $10.;

array ind info1-info3;

do over ind;

if not missing(ind) then do;

index=cats(vname(ind),'_',ind);

leave;

end;

;

data h2;

set yourtable2;

length index $50.;

array ind info1-info3;

do over ind;

index=cats(vname(ind),'_',ind);

output;

end;

;

Haikuo

nicnad · Posted 07-17-2012 02:28 PM

Exactly what I was looking for.

Thank you very much!!!!

ballardw · Posted 07-17-2012 01:42 PM

Are any values of Info1, 2 or 3 duplicated in either table?

Ksharp · Posted 07-17-2012 10:33 PM

If you care about speed, Hash table is a good choice.

data h1;
input (Name    Info1    Info2    Info3) (:$10.);
cards;
John    a    . .   
Paul    .    bb    cc
George    aaa     .    ccc
Adam    aaaa     bbbb    .
Brian    . .        cccc
;
run;

 

data h2;
input (Info1    Info2    Info3    Value1    Value2    Value3) (:$8.);
cards;
a    b    c    ddd    e    f
aa    bb    cc    jj    kk    ll
aaa    bbb    ccc    u    i    z
aaaa    bbbb    cccc    nn    bbaq    vvvv
;
run;

data want(keep=name value:);
length info1 - info3 $ 40;
 if _n_ eq 1 then do;
    if 0 then set h2;
    declare hash ha(hashexp:20);
     ha.definekey('key');
     ha.definedata('value1','value2','value3');
     ha.definedone();

     do until(last);
      set h2 end=last;
      if not missing(info1) then do;key=info1;ha.add(); end;
      if not missing(info2) then do;key=info2;ha.add(); end;
      if not missing(info3) then do;key=info3;ha.add(); end;
     end;
end;
call missing(of _all_);
set h1;
      if not missing(info1) then do;key=info1; rc=ha.find(); end;
      if not missing(info2) then do;key=info2; rc=ha.find(); end;
      if not missing(info3) then do;key=info3; rc=ha.find(); end;
run;

Ksharp

Haikuo · Posted 07-18-2012 11:14 AM

Thanks, Ksharp for bringing in the Hash() idea. Actually index can be inserted into hash() implementation to make it less hard coding:

data h1;

input (Name Info1 Info2 Info3) (:$8.);

cards;

John a . .

Paul . bb cc

George aaa . ccc

Adam aaaa bbbb .

Brian . . cccc

;

data h2;

input (Info1 Info2 Info3 Value1 Value2 Value3) (:$8.);

cards;

a b c ddd e f

aa bb cc jj kk ll

aaa bbb ccc u i z

aaaa bbbb cccc nn bbaq vvvv

;

data want;

dcl hash h(ordered:'a');

h.definekey('var','var_name');

h.definedata('var','var_name','value1','value2','value3');

h.definedone();

do until (last1);

set h2 end=last1;

array ind info1-info3;

do over ind;

var=ind;

var_name=vname(ind);

h.add();

end;

do until (last2);

set h1 end=last2;

array ind1 info1-info3;

do over ind1;

if not missing(ind1) then do;

rc=h.find(key:ind1,key:vname(ind1)); output;

leave;

end;

keep name value1-value3;

stop;

run;

proc print;run;

Haikuo

nicnad · Posted 07-19-2012 12:17 PM

Thank you both for your suggestions!

I would like to add :

- Great forum

-Great Interface

-Great User base

-Fast Answer

Could not have asked for more.

Thank you all for your help and time!

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away