BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasthebest
Calcite | Level 5

I am trying to merge two files and don’t want to get rid of the duplicate values in Table_1 and replaced with the values in table_2. What are the codes that will do it for me?

Data Table_1;

   input ID Name$ SalesQ Price;

   datalines;

101 AMBA   4 15

107 GEORGE 22 15

107 GEORGE 22 15

107 GEORGE 22 15

108 HADEN 9  15

110 JACKY 8  15

109 IDDIE 19 15

104 DAME 6   15

111 KAN  211 15

112 LAN 83 09 15

106 FOYE 26   15

107 GEORGE 22 15

107 GEORGE 22 15

;

data Table_2;

   input ID Name $ SalesQ Price color$;

     datalines;

101 AMBA   6  24 blue

102 BAMBIE 5 32 red

103 CARRIE 1 17  yellow

104 DAME 1 24  green

105 EDDIE 7  2  tan

106 FOYE 3  6 bluegreen

107 GEORGE 100 11 pink

;

Proc sort Data = Table_1;

By ID;

Run;

Proc sort Data = Table_2;

By ID;

Run;

Data Table_3;

Merge Table_1 Table_2;

By ID;

Run;

Current Output

ID

Name

SalesQ

Price

color

101

AMBA

6

24

blue

102

BAMBIE

5

32

red

103

CARRIE

1

17

yellow

104

DAME

1

24

green

105

EDDIE

7

2

tan

106

FOYE

3

6

bluegree

107

GEORGE

100

11

pink

107

GEORGE

22

15

pink

107

GEORGE

22

15

pink

107

GEORGE

22

15

pink

107

GEORGE

22

15

pink

108

HADEN

9

15

109

IDDIE

19

15

110

JACKY

8

15

111

KAN

211

15

112

LAN

83

9

Desired Output

ID

Name

SalesQ

Price

color

101

AMBA

6

24

blue

102

BAMBIE

5

32

red

103

CARRIE

1

17

yellow

104

DAME

1

24

green

105

EDDIE

7

2

tan

106

FOYE

3

6

bluegree

107

GEORGE

100

11

pink

107

GEORGE

100

11

pink

107

GEORGE

100

11

pink

107

GEORGE

100

11

pink

107

GEORGE

100

11

pink

108

HADEN

9

15

109

IDDIE

19

15

110

JACKY

8

15

111

KAN

211

15

112

LAN

83

9

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Another way to look at the problem is to control the output based on whether data exists in both files.  e.g.:

Data Table_3 (drop=in:);

  merge Table_2 (in=in2)

        Table_1 (in=in1 rename=(SalesQ=in_S Price=in_P));

  By ID;

  if in1 and not(in2) then do;

    SalesQ=in_S;

    Price=in_P;

  end;

Run;

HTH,

Art

View solution in original post

8 REPLIES 8
robby_beum
Quartz | Level 8

What happens if you reverse your tables in the merge statement like the following?

Data Table_3;

Merge Table_2 Table_1;

By ID;

Run;

tish
Calcite | Level 5

proc sql;

    create table output as

        select

            coalesce(a.ID, b.ID) as ID,

            coalescec(a.name, b.name) as name,

            case

                    when missing(b.salesq) then a.salesq

                    else b.salesq

                end as salesq,

            case

                    when missing(b.price) then a.price

                    else b.price

                end as price,

            b.color

        from

            table_1 as a

                full join

            table_2 as b

        on

            a.ID = b.ID

        order by

            ID;

quit;

Haikuo
Onyx | Level 15

It seems to me to be hash() problem, no need to presort:

data want;

  if _n_=1 then do;

    if 0 then set table_2;

    dcl hash h(dataset:"table_2");

    h.definekey("id");

    h.definedata(all:"y");

    h.definedone();

   end;

   set table_1;

   _n_=h.find();

   output;

   call missing(of _all_);

run;

Haikuo

Haikuo
Onyx | Level 15

Colleagues, I have invested some thoughts into problem after I realized that it is more than it looks like. This problem can be used as the touchstone to show what classic data step Can do or Can Not do. The signature feature of data step is sequential processing, it is efficient and relying less on the capacity of memory. However, one thing difficult or awkward for data step to do is 'roll back', data step implicit loop or even data step index will only go One-way. This is why although we have data step index, it can only be used on one-to-one merge. The advent of hash() changes everything, with hash(), data step can do almost everything that Proc SQL is capable of, and most of the time with an advantageous margin in performance. So you see, if using data step to do a merge requiring 'roll back', then it has to be 'random access' kind of technique, in fact, we can actually mimic hash() behavior using array() + point= under classic data step setting, of course it is not as slick and robust as hash():

data want;

  array ind(0:9999) _temporary_;

  do _n_=1 by 1 until (l1);

     set table_2 end=l1;

     ind(id)=_n_;

  end;

  do until (l2);

     set table_1 end=l2;

     _n_=ind(id);

     set table_2 point=_n_;

     output;

     call missing(color);

  end;

run;

Any comments will be highly appreciated. And thank OP for raising this interesting question.

Haikuo

art297
Opal | Level 21

Another way to look at the problem is to control the output based on whether data exists in both files.  e.g.:

Data Table_3 (drop=in:);

  merge Table_2 (in=in2)

        Table_1 (in=in1 rename=(SalesQ=in_S Price=in_P));

  By ID;

  if in1 and not(in2) then do;

    SalesQ=in_S;

    Price=in_P;

  end;

Run;

HTH,

Art

Linlin
Lapis Lazuli | Level 10

Borrowed Art's idea:

Data Table_3b (drop=in:);

  merge Table_1 Table_2 (in=in2 rename=(SalesQ=in_S Price=in_P));

  By ID;

  if in2 then do;

    SalesQ=in_S;

    Price=in_P;

  end;

Run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2478 views
  • 13 likes
  • 6 in conversation