Help using Base SAS procedures

Merging data with duplicates

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 18
Accepted Solution

Merging data with duplicates

I am trying to merge two files and don’t want to get rid of the duplicate values in Table_1 and replaced with the values in table_2. What are the codes that will do it for me?

Data Table_1;

   input ID Name$ SalesQ Price;

   datalines;

101 AMBA   4 15

107 GEORGE 22 15

107 GEORGE 22 15

107 GEORGE 22 15

108 HADEN 9  15

110 JACKY 8  15

109 IDDIE 19 15

104 DAME 6   15

111 KAN  211 15

112 LAN 83 09 15

106 FOYE 26   15

107 GEORGE 22 15

107 GEORGE 22 15

;

data Table_2;

   input ID Name $ SalesQ Price color$;

     datalines;

101 AMBA   6  24 blue

102 BAMBIE 5 32 red

103 CARRIE 1 17  yellow

104 DAME 1 24  green

105 EDDIE 7  2  tan

106 FOYE 3  6 bluegreen

107 GEORGE 100 11 pink

;

Proc sort Data = Table_1;

By ID;

Run;

Proc sort Data = Table_2;

By ID;

Run;

Data Table_3;

Merge Table_1 Table_2;

By ID;

Run;

Current Output

ID

Name

SalesQ

Price

color

101

AMBA

6

24

blue

102

BAMBIE

5

32

red

103

CARRIE

1

17

yellow

104

DAME

1

24

green

105

EDDIE

7

2

tan

106

FOYE

3

6

bluegree

107

GEORGE

100

11

pink

107

GEORGE

22

15

pink

107

GEORGE

22

15

pink

107

GEORGE

22

15

pink

107

GEORGE

22

15

pink

108

HADEN

9

15

109

IDDIE

19

15

110

JACKY

8

15

111

KAN

211

15

112

LAN

83

9

Desired Output

ID

Name

SalesQ

Price

color

101

AMBA

6

24

blue

102

BAMBIE

5

32

red

103

CARRIE

1

17

yellow

104

DAME

1

24

green

105

EDDIE

7

2

tan

106

FOYE

3

6

bluegree

107

GEORGE

100

11

pink

107

GEORGE

100

11

pink

107

GEORGE

100

11

pink

107

GEORGE

100

11

pink

107

GEORGE

100

11

pink

108

HADEN

9

15

109

IDDIE

19

15

110

JACKY

8

15

111

KAN

211

15

112

LAN

83

9


Accepted Solutions
Solution
‎07-26-2012 10:05 AM
PROC Star
Posts: 7,467

Re: Merging data with duplicates

Posted in reply to sasthebest

Another way to look at the problem is to control the output based on whether data exists in both files.  e.g.:

Data Table_3 (drop=inSmiley Happy;

  merge Table_2 (in=in2)

        Table_1 (in=in1 rename=(SalesQ=in_S Price=in_P));

  By ID;

  if in1 and not(in2) then do;

    SalesQ=in_S;

    Price=in_P;

  end;

Run;

HTH,

Art

View solution in original post


All Replies
Super Contributor
Posts: 1,636

Re: I am trying to merge two files and don’t want to get rid of the duplicate values in Table_1 and replaced with the values in table_2. What are the codes that will do it for me?

Posted in reply to sasthebest

try:

Data Table_3;

Merge Table_1 Table_2(keep=id color);

By ID;

Run;

Occasional Contributor
Posts: 18

Re: I am trying to merge two files and don’t want to get rid of the duplicate values in Table_1 and replaced with the values in table_2. What are the codes that will do it for me?

I tried both suggestions but they did not work. Please give it another short

Contributor
Posts: 73

Re: I am trying to merge two files and don’t want to get rid of the duplicate values in Table_1 and replaced with the values in table_2. What are the codes that will do it for me?

Posted in reply to sasthebest

What happens if you reverse your tables in the merge statement like the following?

Data Table_3;

Merge Table_2 Table_1;

By ID;

Run;

Contributor
Posts: 52

Re: Merging data with duplicates

Posted in reply to sasthebest

proc sql;

    create table output as

        select

            coalesce(a.ID, b.ID) as ID,

            coalescec(a.name, b.name) as name,

            case

                    when missing(b.salesq) then a.salesq

                    else b.salesq

                end as salesq,

            case

                    when missing(b.price) then a.price

                    else b.price

                end as price,

            b.color

        from

            table_1 as a

                full join

            table_2 as b

        on

            a.ID = b.ID

        order by

            ID;

quit;

Respected Advisor
Posts: 3,156

Re: Merging data with duplicates

Posted in reply to sasthebest

It seems to me to be hash() problem, no need to presort:

data want;

  if _n_=1 then do;

    if 0 then set table_2;

    dcl hash h(dataset:"table_2");

    h.definekey("id");

    h.definedata(all:"y");

    h.definedone();

   end;

   set table_1;

   _n_=h.find();

   output;

   call missing(of _all_);

run;

Haikuo

Respected Advisor
Posts: 3,156

Re: Merging data with duplicates

Colleagues, I have invested some thoughts into problem after I realized that it is more than it looks like. This problem can be used as the touchstone to show what classic data step Can do or Can Not do. The signature feature of data step is sequential processing, it is efficient and relying less on the capacity of memory. However, one thing difficult or awkward for data step to do is 'roll back', data step implicit loop or even data step index will only go One-way. This is why although we have data step index, it can only be used on one-to-one merge. The advent of hash() changes everything, with hash(), data step can do almost everything that Proc SQL is capable of, and most of the time with an advantageous margin in performance. So you see, if using data step to do a merge requiring 'roll back', then it has to be 'random access' kind of technique, in fact, we can actually mimic hash() behavior using array() + point= under classic data step setting, of course it is not as slick and robust as hash():

data want;

  array ind(0:9999) _temporary_;

  do _n_=1 by 1 until (l1);

     set table_2 end=l1;

     ind(id)=_n_;

  end;

  do until (l2);

     set table_1 end=l2;

     _n_=ind(id);

     set table_2 point=_n_;

     output;

     call missing(color);

  end;

run;

Any comments will be highly appreciated. And thank OP for raising this interesting question.

Haikuo

Solution
‎07-26-2012 10:05 AM
PROC Star
Posts: 7,467

Re: Merging data with duplicates

Posted in reply to sasthebest

Another way to look at the problem is to control the output based on whether data exists in both files.  e.g.:

Data Table_3 (drop=inSmiley Happy;

  merge Table_2 (in=in2)

        Table_1 (in=in1 rename=(SalesQ=in_S Price=in_P));

  By ID;

  if in1 and not(in2) then do;

    SalesQ=in_S;

    Price=in_P;

  end;

Run;

HTH,

Art

Super Contributor
Posts: 1,636

Re: Merging data with duplicates

Borrowed Art's idea:

Data Table_3b (drop=inSmiley Happy;

  merge Table_1 Table_2 (in=in2 rename=(SalesQ=in_S Price=in_P));

  By ID;

  if in2 then do;

    SalesQ=in_S;

    Price=in_P;

  end;

Run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 384 views
  • 11 likes
  • 6 in conversation