Data Merging

triley · Posted 04-25-2017 10:11 AM

I am trying to merge 3 datasets together so that I only have unique records at the end.

I have provided the code and an example of how I would like the final table to look like in the end. I have highlighted the part I am having issues with. I only want to keep the names that don't have a match within any of the table combinations. It is also important to keep the "left-most" name. So in the case below, I want to keep name 'ghi' since it is in the first table, and put the key for the 2nd table in the new table on that records since the names match, and I also want to have the key for the 3rd table since the keys matched between 'ghi' in the 2nd table and 'vwx' in the 3rd table. I also want to not keep the 'vwx' from the 3rd table since it found a match.

If I didn't explain that well or if anyone has any additional questions, please feel free to reach out. Thank you and I appreciated the help in advance.

Thanks,

Tom

data have1;

input name1 $ 3. key1 4.;

datalines ;

abc 111

def 112

ghi 113

;

run;

data have2;

input name2 $ 3. key2 4.;

datalines ;

jkl 114

ghi 115

pqr 116

;

run;

data have3;

input name3 $ 3. key3 4.;

datalines ;

stu 117

vwx 115

xyz 118

;

run;

name	key1	key2	key3
abc	111
def	112
ghi	113	115	115
jkl		114
pqr		116
stu			117
xyz			118

RW9 · Posted 04-25-2017 10:38 AM

Well, its a bit of a faff. Your data is not in an ideal format. The main point is to combine the data, process, then transpose:

data have1;
input name1 $ 3. key1 4.;
datalines;
abc 111
def 112
ghi 113
;
run;

data have2;
  input name2 $ 3. key2 4.;
datalines ;
jkl 114
ghi 115
pqr 116
;
run;

data have3;
input name3 $ 3. key3 4.;
datalines ;
stu 117
vwx 115
xyz 118
;
run;

data inter;
  set have1 (rename=(name1=name key1=key))
      have2 (rename=(name2=name key2=key))
      have3 (rename=(name3=name key3=key))
      indsname=tmp;
  dsname=tmp;
  dsname=tranwrd(scan(dsname,2,"."),"HAVE","KEY");
run;
proc sort data=inter;
  by key dsname;
run;
data inter;
  set inter;
  by key;
  if not(first.key) then delete;
run;
proc sort data=inter;
  by name key;
run;
proc transpose data=inter out=want;
  by name;
  var key;
  id dsname;
  idlabel dsname;
run;
data want;
  set want;
  if key1 ne . and key2 ne . then key3=key2;
run;

triley · Posted 04-25-2017 02:00 PM

Thank you for the quick response, however it didn't work exactly like I need because I don't think I explained it properly. There is one more layer to it where I need additional matching. I modified a few of the tables below and included the new result I'm looking for...

The additional layer I need is for the query to remove also 'pqr' since it matches 'have1' to 'have2' on 'key, and 'have2' to 'have3' on 'name'

data have1;

input name1 $ 3. key1 4.;

datalines ;

abc 111

def 112

ghi 113

;

run;

data have2;

input name2 $ 3. key2 4.;

datalines ;

jkl 114

ghi 115

pqr 111

;

run;

data have3;

input name3 $ 3. key3 4.;

datalines ;

stu 117

vwx 115

pqr 118

;

run;

name	key1	key2	key3
abc	111	111	118
def	112
ghi	113	115	115
jkl		114
stu			117

Ksharp · Posted 04-26-2017 10:57 AM

Your logic is a lot more complicated.

What if there are duplicated name or duplicated key ,what you gonna do ?

The following is for the first two tables.

data have1;
input name1 $ 3. key1 4.;
datalines ;
abc 111
def 112
ghi 113
;
run;
data have2;
input name2 $ 3. key2 4.;
datalines ;
jkl 114
ghi 115
pqr 111
;
run;
data have3;
input name3 $ 3. key3 4.;
datalines ;
stu 117
vwx 115
pqr 118
;
run;

data want1;
 if _n_=1 then do;
  if 0 then set have2;
  declare hash h21(dataset:'have2');
  declare hiter hi_h21('h21');
  h21.definekey('name2');
  h21.definedata('name2','key2');
  h21.definedone();
  
  declare hash h22(dataset:'have2');
  h22.definekey('key2');
  h22.definedata('key2');
  h22.definedone();

  
 end;
call missing(of _all_);
set have1 end=last;
if h21.check(key:name1)=0 then do;
 h21.find(key:name1);h21.remove(key:name1);
end;
else if h22.check(key:key1)=0 then do;
 h22.find(key:key1);h22.remove(key:key1);
end;

output;

if last then do;
call missing(of _all_);
do while(hi_h21.next()=0);
 if h22.check()=0 then do;name1=name2;output;end;
end;

end;
drop name2 ;
run;

triley · Posted 04-26-2017 11:10 AM

Thank you for the response.

I'm not getting the resuls table i need though. To answer your question, I know i'm going to have duplicate names and keys, and for those I want to condense down to 1 record. In the example I submited, I have 'ghi' in two tables. I always want to keep the 'left-most' name, so in this case I would keep name1, and merge the key for the name2 record with 'ghi' onto 'left-most' 'ghi'.

I hope this makes sense, if not please let me know and I can try and explain better.

Thank You.

Ksharp · Posted 04-27-2017 10:57 AM

I am not sure the following code could get you what you want.

Your logic indeed make me confused.

data have1;
input name1 $ 3. key1 4.;
datalines ;
abc 111
def 112
ghi 113
;
run;
data have2;
input name2 $ 3. key2 4.;
datalines ;
jkl 114
ghi 115
pqr 111
;
run;
data have3;
input name3 $ 3. key3 4.;
datalines ;
stu 117
vwx 115
pqr 118
;
run;
proc sql;

create index name1 on have1;
create index key1 on have1;

create index name2 on have2;
create index key2 on have2;

quit;
data temp1 not1;
 merge have1(in=ina) have2(rename=(name2=name1) in=inb);
 by name1;
 if not ina then output not1;
 else output temp1;
run;
data temp2 not2;
 merge temp1(in=ina) have2(rename=(key2=key1) in=inb);
 by key1;
 if not missing(name2) then key2=key1;
 if not ina then output not2;
 else output temp2;
run;

proc sql;
create table temp_want as
select name1,key2 from not1
 intersect
select name2,key2 from not2;
quit;
data temp2;
 set temp2 temp_want;
 if missing(name2) then name2=name1;
run;













data have1;
 set temp2;
run;
data have2;
 set have3;
run;

proc sql;

create index name2 on have1;
create index key2 on have1;

create index name3 on have2;
create index key3 on have2;


quit;

data temp1(index=(key2)) not1;
 merge have1(in=ina) have2(rename=(name3=name2) in=inb);
 by name2;
 if not ina then output not1;
 else output temp1;
run;
data temp2 not2;
 merge temp1(in=ina) have2(rename=(key3=key2) in=inb);
 by key2;
 if not missing(name3) then key3=key2;
 if not ina then output not2;
 else output temp2;
run;

proc sql;
create table temp_want as
select name2 as name1,key3 from not1
 intersect
select name3,key3 from not2;
quit;
data want;
 set temp2 temp_want;
 drop name2 name3;
run;

Data Merging

Re: Data Merging

Re: Data Merging

Re: Data Merging

Re: Data Merging

Re: Data Merging

Data Merging

Re: Data Merging

Re: Data Merging

Re: Data Merging

Re: Data Merging

Re: Data Merging

Click image to register for webinar

Classroom Training Available!