Data Merging

Reply
Contributor
Posts: 50

Data Merging

I am trying to merge 3 datasets together so that I only have unique records at the end.

 

I have provided the code and an example of how I would like the final table to look like in the end. I have highlighted the part I am having issues with. I only want to keep the names that don't have a match within any of the table combinations. It is also important to keep the "left-most" name. So in the case below, I want to keep name 'ghi' since it is in the first table, and put the key for the 2nd table in the new table on that records since the names match, and I also want to have the key for the 3rd table since the keys matched between 'ghi' in the 2nd table and 'vwx' in the 3rd table. I also want to not keep the 'vwx' from the 3rd table since it found a match.

 

If I didn't explain that well or if anyone has any additional questions, please feel free to reach out. Thank you and I appreciated the help in advance.

 

Thanks,

Tom

 

 

 

 

data have1;

input name1 $ 3. key1 4.;

datalines ;

abc 111

def 112

ghi 113

;

run;

 

data have2;

input name2 $ 3. key2 4.;

datalines ;

jkl 114

ghi 115

pqr 116

;

run;

 

data have3;

input name3 $ 3. key3 4.;

datalines ;

stu 117

vwx 115

xyz 118

;

run;

 

 

 

name key1 key2 key3
abc 111    
def 112    
ghi 113 115 115
jkl   114  
pqr   116  
stu     117
xyz     118
Esteemed Advisor
Esteemed Advisor
Posts: 7,223

Re: Data Merging

Well, its a bit of a faff.  Your data is not in an ideal format.  The main point is to combine the data, process, then transpose:

data have1;
input name1 $ 3. key1 4.;
datalines;
abc 111
def 112
ghi 113
;
run;

data have2;
  input name2 $ 3. key2 4.;
datalines ;
jkl 114
ghi 115
pqr 116
;
run;

data have3;
input name3 $ 3. key3 4.;
datalines ;
stu 117
vwx 115
xyz 118
;
run;

data inter;
  set have1 (rename=(name1=name key1=key))
      have2 (rename=(name2=name key2=key))
      have3 (rename=(name3=name key3=key))
      indsname=tmp;
  dsname=tmp;
  dsname=tranwrd(scan(dsname,2,"."),"HAVE","KEY");
run;
proc sort data=inter;
  by key dsname;
run;
data inter;
  set inter;
  by key;
  if not(first.key) then delete;
run;
proc sort data=inter;
  by name key;
run;
proc transpose data=inter out=want;
  by name;
  var key;
  id dsname;
  idlabel dsname;
run;
data want;
  set want;
  if key1 ne . and key2 ne . then key3=key2;
run;
Contributor
Posts: 50

Re: Data Merging

Thank you for the quick response, however it didn't work exactly like I need because I don't think I explained it properly. There is one more layer to it where I need additional matching. I modified a few of the tables below and included the new result I'm looking for...

 

The additional layer I need is for the query to remove also 'pqr' since it matches 'have1' to 'have2' on 'key, and 'have2' to 'have3' on 'name'

 

 

 

data have1;

input name1 $ 3. key1 4.;

datalines ;

abc 111

def 112

ghi 113

;

run;

 

data have2;

input name2 $ 3. key2 4.;

datalines ;

jkl 114

ghi 115

pqr 111

;

run;

 

data have3;

input name3 $ 3. key3 4.;

datalines ;

stu 117

vwx 115

pqr 118

;

run;

 

 

 

name key1 key2 key3
abc 111 111 118
def 112    
ghi 113 115 115
jkl   114  
stu     117
Grand Advisor
Posts: 9,583

Re: Data Merging

Your logic is a lot more complicated.

What if there are duplicated name or duplicated key ,what you gonna do ?

 

The following is for the first two tables.

 

data have1;
input name1 $ 3. key1 4.;
datalines ;
abc 111
def 112
ghi 113
;
run;
data have2;
input name2 $ 3. key2 4.;
datalines ;
jkl 114
ghi 115
pqr 111
;
run;
data have3;
input name3 $ 3. key3 4.;
datalines ;
stu 117
vwx 115
pqr 118
;
run;

data want1;
 if _n_=1 then do;
  if 0 then set have2;
  declare hash h21(dataset:'have2');
  declare hiter hi_h21('h21');
  h21.definekey('name2');
  h21.definedata('name2','key2');
  h21.definedone();
  
  declare hash h22(dataset:'have2');
  h22.definekey('key2');
  h22.definedata('key2');
  h22.definedone();

  
 end;
call missing(of _all_);
set have1 end=last;
if h21.check(key:name1)=0 then do;
 h21.find(key:name1);h21.remove(key:name1);
end;
else if h22.check(key:key1)=0 then do;
 h22.find(key:key1);h22.remove(key:key1);
end;

output;

if last then do;
call missing(of _all_);
do while(hi_h21.next()=0);
 if h22.check()=0 then do;name1=name2;output;end;
end;

end;
drop name2 ;
run;

Contributor
Posts: 50

Re: Data Merging

Thank you for the response.

 

I'm not getting the resuls table i need though. To answer your question, I know i'm going to have duplicate names and keys, and for those I want to condense down to 1 record. In the example I submited, I have 'ghi' in two tables. I always want to keep the 'left-most' name, so in this case I would keep name1, and merge the key for the name2 record with 'ghi' onto 'left-most' 'ghi'.

 

I hope this makes sense, if not please let me know and I can try and explain better.

 

Thank You.

Grand Advisor
Posts: 9,583

Re: Data Merging

I am not sure the following code could get you what you want.

Your logic indeed make me confused.

 

data have1;
input name1 $ 3. key1 4.;
datalines ;
abc 111
def 112
ghi 113
;
run;
data have2;
input name2 $ 3. key2 4.;
datalines ;
jkl 114
ghi 115
pqr 111
;
run;
data have3;
input name3 $ 3. key3 4.;
datalines ;
stu 117
vwx 115
pqr 118
;
run;
proc sql;

create index name1 on have1;
create index key1 on have1;

create index name2 on have2;
create index key2 on have2;

quit;
data temp1 not1;
 merge have1(in=ina) have2(rename=(name2=name1) in=inb);
 by name1;
 if not ina then output not1;
 else output temp1;
run;
data temp2 not2;
 merge temp1(in=ina) have2(rename=(key2=key1) in=inb);
 by key1;
 if not missing(name2) then key2=key1;
 if not ina then output not2;
 else output temp2;
run;

proc sql;
create table temp_want as
select name1,key2 from not1
 intersect
select name2,key2 from not2;
quit;
data temp2;
 set temp2 temp_want;
 if missing(name2) then name2=name1;
run;













data have1;
 set temp2;
run;
data have2;
 set have3;
run;

proc sql;

create index name2 on have1;
create index key2 on have1;

create index name3 on have2;
create index key3 on have2;


quit;

data temp1(index=(key2)) not1;
 merge have1(in=ina) have2(rename=(name3=name2) in=inb);
 by name2;
 if not ina then output not1;
 else output temp1;
run;
data temp2 not2;
 merge temp1(in=ina) have2(rename=(key3=key2) in=inb);
 by key2;
 if not missing(name3) then key3=key2;
 if not ina then output not2;
 else output temp2;
run;

proc sql;
create table temp_want as
select name2 as name1,key3 from not1
 intersect
select name3,key3 from not2;
quit;
data want;
 set temp2 temp_want;
 drop name2 name3;
run;
Ask a Question
Discussion stats
  • 5 replies
  • 143 views
  • 2 likes
  • 3 in conversation