Solved: Re: Merging data with duplicates

sasthebest · Posted 07-25-2012 04:49 PM

I am trying to merge two files and don’t want to get rid of the duplicate values in Table_1 and replaced with the values in table_2. What are the codes that will do it for me?

Data Table_1;

input ID Name$ SalesQ Price;

datalines;

101 AMBA 4 15

107 GEORGE 22 15

108 HADEN 9 15

110 JACKY 8 15

109 IDDIE 19 15

104 DAME 6 15

111 KAN 211 15

112 LAN 83 09 15

106 FOYE 26 15

107 GEORGE 22 15

;

data Table_2;

input ID Name $ SalesQ Price color$;

datalines;

101 AMBA 6 24 blue

102 BAMBIE 5 32 red

103 CARRIE 1 17 yellow

104 DAME 1 24 green

105 EDDIE 7 2 tan

106 FOYE 3 6 bluegreen

107 GEORGE 100 11 pink

;

Proc sort Data = Table_1;

By ID;

Run;

Proc sort Data = Table_2;

By ID;

Run;

Data Table_3;

Merge Table_1 Table_2;

By ID;

Run;

Current Output
ID	Name	SalesQ	Price	color
101	AMBA	6	24	blue
102	BAMBIE	5	32	red
103	CARRIE	1	17	yellow
104	DAME	1	24	green
105	EDDIE	7	2	tan
106	FOYE	3	6	bluegree
107	GEORGE	100	11	pink
107	GEORGE	22	15	pink
107	GEORGE	22	15	pink
107	GEORGE	22	15	pink
107	GEORGE	22	15	pink
108	HADEN	9	15
109	IDDIE	19	15
110	JACKY	8	15
111	KAN	211	15
112	LAN	83	9


Desired Output
ID	Name	SalesQ	Price	color
101	AMBA	6	24	blue
102	BAMBIE	5	32	red
103	CARRIE	1	17	yellow
104	DAME	1	24	green
105	EDDIE	7	2	tan
106	FOYE	3	6	bluegree
107	GEORGE	100	11	pink
107	GEORGE	100	11	pink
107	GEORGE	100	11	pink
107	GEORGE	100	11	pink
107	GEORGE	100	11	pink
108	HADEN	9	15
109	IDDIE	19	15
110	JACKY	8	15
111	KAN	211	15
112	LAN	83	9

art297 · Posted 07-26-2012 10:05 AM

Another way to look at the problem is to control the output based on whether data exists in both files. e.g.:

Data Table_3 (drop=in:);

merge Table_2 (in=in2)

Table_1 (in=in1 rename=(SalesQ=in_S Price=in_P));

By ID;

if in1 and not(in2) then do;

SalesQ=in_S;

Price=in_P;

end;

Run;

HTH,

Art

View solution in original post

Linlin · Posted 07-25-2012 04:53 PM

try:

Data Table_3;

Merge Table_1 Table_2(keep=id color);

By ID;

Run;

sasthebest · Posted 07-25-2012 05:10 PM

I tried both suggestions but they did not work. Please give it another short

robby_beum · Posted 07-25-2012 04:54 PM

What happens if you reverse your tables in the merge statement like the following?

Data Table_3;

Merge Table_2 Table_1;

By ID;

Run;

tish · Posted 07-25-2012 05:39 PM

proc sql;

create table output as

select

coalesce(a.ID, b.ID) as ID,

coalescec(a.name, b.name) as name,

case

when missing(b.salesq) then a.salesq

else b.salesq

end as salesq,

case

when missing(b.price) then a.price

else b.price

end as price,

b.color

from

table_1 as a

full join

table_2 as b

on

a.ID = b.ID

order by

ID;

quit;

Haikuo · Posted 07-25-2012 10:39 PM

It seems to me to be hash() problem, no need to presort:

data want;

if _n_=1 then do;

if 0 then set table_2;

dcl hash h(dataset:"table_2");

h.definekey("id");

h.definedata(all:"y");

h.definedone();

end;

set table_1;

_n_=h.find();

output;

call missing(of _all_);

run;

Haikuo

Haikuo · Posted 07-26-2012 09:33 AM

Colleagues, I have invested some thoughts into problem after I realized that it is more than it looks like. This problem can be used as the touchstone to show what classic data step Can do or Can Not do. The signature feature of data step is sequential processing, it is efficient and relying less on the capacity of memory. However, one thing difficult or awkward for data step to do is 'roll back', data step implicit loop or even data step index will only go One-way. This is why although we have data step index, it can only be used on one-to-one merge. The advent of hash() changes everything, with hash(), data step can do almost everything that Proc SQL is capable of, and most of the time with an advantageous margin in performance. So you see, if using data step to do a merge requiring 'roll back', then it has to be 'random access' kind of technique, in fact, we can actually mimic hash() behavior using array() + point= under classic data step setting, of course it is not as slick and robust as hash():

data want;

array ind(0:9999) _temporary_;

do _n_=1 by 1 until (l1);

set table_2 end=l1;

ind(id)=_n_;

end;

do until (l2);

set table_1 end=l2;

_n_=ind(id);

set table_2 point=_n_;

output;

call missing(color);

end;

run;

Any comments will be highly appreciated. And thank OP for raising this interesting question.

Haikuo

art297 · Posted 07-26-2012 10:05 AM

Another way to look at the problem is to control the output based on whether data exists in both files. e.g.:

Data Table_3 (drop=in:);

merge Table_2 (in=in2)

Table_1 (in=in1 rename=(SalesQ=in_S Price=in_P));

By ID;

if in1 and not(in2) then do;

SalesQ=in_S;

Price=in_P;

end;

Run;

HTH,

Art

Linlin · Posted 07-26-2012 10:46 AM

Borrowed Art's idea:

Data Table_3b (drop=in:);

merge Table_1 Table_2 (in=in2 rename=(SalesQ=in_S Price=in_P));

By ID;

if in2 then do;

SalesQ=in_S;

Price=in_P;

end;

Run;

Merging data with duplicates

Re: Merging data with duplicates

Re: I am trying to merge two files and don’t want to get rid of the duplicate values in Table_1 and replaced with the values in table_2. What are the codes that will do it for me?

Re: I am trying to merge two files and don’t want to get rid of the duplicate values in Table_1 and replaced with the values in table_2. What are the codes that will do it for me?

Re: I am trying to merge two files and don’t want to get rid of the duplicate values in Table_1 and replaced with the values in table_2. What are the codes that will do it for me?

Re: Merging data with duplicates

Re: Merging data with duplicates

Re: Merging data with duplicates

Re: Merging data with duplicates

Re: Merging data with duplicates

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away