I am trying to merge 3 datasets together so that I only have unique records at the end.
I have provided the code and an example of how I would like the final table to look like in the end. I have highlighted the part I am having issues with. I only want to keep the names that don't have a match within any of the table combinations. It is also important to keep the "left-most" name. So in the case below, I want to keep name 'ghi' since it is in the first table, and put the key for the 2nd table in the new table on that records since the names match, and I also want to have the key for the 3rd table since the keys matched between 'ghi' in the 2nd table and 'vwx' in the 3rd table. I also want to not keep the 'vwx' from the 3rd table since it found a match.
If I didn't explain that well or if anyone has any additional questions, please feel free to reach out. Thank you and I appreciated the help in advance.
Thanks,
Tom
data have1;
input name1 $ 3. key1 4.;
datalines ;
abc 111
def 112
ghi 113
;
run;
data have2;
input name2 $ 3. key2 4.;
datalines ;
jkl 114
ghi 115
pqr 116
;
run;
data have3;
input name3 $ 3. key3 4.;
datalines ;
stu 117
vwx 115
xyz 118
;
run;
name | key1 | key2 | key3 |
abc | 111 | ||
def | 112 | ||
ghi | 113 | 115 | 115 |
jkl | 114 | ||
pqr | 116 | ||
stu | 117 | ||
xyz | 118 |
Well, its a bit of a faff. Your data is not in an ideal format. The main point is to combine the data, process, then transpose:
data have1; input name1 $ 3. key1 4.; datalines; abc 111 def 112 ghi 113 ; run; data have2; input name2 $ 3. key2 4.; datalines ; jkl 114 ghi 115 pqr 116 ; run; data have3; input name3 $ 3. key3 4.; datalines ; stu 117 vwx 115 xyz 118 ; run; data inter; set have1 (rename=(name1=name key1=key)) have2 (rename=(name2=name key2=key)) have3 (rename=(name3=name key3=key)) indsname=tmp; dsname=tmp; dsname=tranwrd(scan(dsname,2,"."),"HAVE","KEY"); run; proc sort data=inter; by key dsname; run; data inter; set inter; by key; if not(first.key) then delete; run; proc sort data=inter; by name key; run; proc transpose data=inter out=want; by name; var key; id dsname; idlabel dsname; run; data want; set want; if key1 ne . and key2 ne . then key3=key2; run;
Thank you for the quick response, however it didn't work exactly like I need because I don't think I explained it properly. There is one more layer to it where I need additional matching. I modified a few of the tables below and included the new result I'm looking for...
The additional layer I need is for the query to remove also 'pqr' since it matches 'have1' to 'have2' on 'key, and 'have2' to 'have3' on 'name'
data have1;
input name1 $ 3. key1 4.;
datalines ;
abc 111
def 112
ghi 113
;
run;
data have2;
input name2 $ 3. key2 4.;
datalines ;
jkl 114
ghi 115
pqr 111
;
run;
data have3;
input name3 $ 3. key3 4.;
datalines ;
stu 117
vwx 115
pqr 118
;
run;
name | key1 | key2 | key3 |
abc | 111 | 111 | 118 |
def | 112 | ||
ghi | 113 | 115 | 115 |
jkl | 114 | ||
stu | 117 |
Your logic is a lot more complicated.
What if there are duplicated name or duplicated key ,what you gonna do ?
The following is for the first two tables.
data have1;
input name1 $ 3. key1 4.;
datalines ;
abc 111
def 112
ghi 113
;
run;
data have2;
input name2 $ 3. key2 4.;
datalines ;
jkl 114
ghi 115
pqr 111
;
run;
data have3;
input name3 $ 3. key3 4.;
datalines ;
stu 117
vwx 115
pqr 118
;
run;
data want1;
if _n_=1 then do;
if 0 then set have2;
declare hash h21(dataset:'have2');
declare hiter hi_h21('h21');
h21.definekey('name2');
h21.definedata('name2','key2');
h21.definedone();
declare hash h22(dataset:'have2');
h22.definekey('key2');
h22.definedata('key2');
h22.definedone();
end;
call missing(of _all_);
set have1 end=last;
if h21.check(key:name1)=0 then do;
h21.find(key:name1);h21.remove(key:name1);
end;
else if h22.check(key:key1)=0 then do;
h22.find(key:key1);h22.remove(key:key1);
end;
output;
if last then do;
call missing(of _all_);
do while(hi_h21.next()=0);
if h22.check()=0 then do;name1=name2;output;end;
end;
end;
drop name2 ;
run;
Thank you for the response.
I'm not getting the resuls table i need though. To answer your question, I know i'm going to have duplicate names and keys, and for those I want to condense down to 1 record. In the example I submited, I have 'ghi' in two tables. I always want to keep the 'left-most' name, so in this case I would keep name1, and merge the key for the name2 record with 'ghi' onto 'left-most' 'ghi'.
I hope this makes sense, if not please let me know and I can try and explain better.
Thank You.
I am not sure the following code could get you what you want.
Your logic indeed make me confused.
data have1;
input name1 $ 3. key1 4.;
datalines ;
abc 111
def 112
ghi 113
;
run;
data have2;
input name2 $ 3. key2 4.;
datalines ;
jkl 114
ghi 115
pqr 111
;
run;
data have3;
input name3 $ 3. key3 4.;
datalines ;
stu 117
vwx 115
pqr 118
;
run;
proc sql;
create index name1 on have1;
create index key1 on have1;
create index name2 on have2;
create index key2 on have2;
quit;
data temp1 not1;
merge have1(in=ina) have2(rename=(name2=name1) in=inb);
by name1;
if not ina then output not1;
else output temp1;
run;
data temp2 not2;
merge temp1(in=ina) have2(rename=(key2=key1) in=inb);
by key1;
if not missing(name2) then key2=key1;
if not ina then output not2;
else output temp2;
run;
proc sql;
create table temp_want as
select name1,key2 from not1
intersect
select name2,key2 from not2;
quit;
data temp2;
set temp2 temp_want;
if missing(name2) then name2=name1;
run;
data have1;
set temp2;
run;
data have2;
set have3;
run;
proc sql;
create index name2 on have1;
create index key2 on have1;
create index name3 on have2;
create index key3 on have2;
quit;
data temp1(index=(key2)) not1;
merge have1(in=ina) have2(rename=(name3=name2) in=inb);
by name2;
if not ina then output not1;
else output temp1;
run;
data temp2 not2;
merge temp1(in=ina) have2(rename=(key3=key2) in=inb);
by key2;
if not missing(name3) then key3=key2;
if not ina then output not2;
else output temp2;
run;
proc sql;
create table temp_want as
select name2 as name1,key3 from not1
intersect
select name3,key3 from not2;
quit;
data want;
set temp2 temp_want;
drop name2 name3;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.