BookmarkSubscribeRSS Feed
DavidJ
Calcite | Level 5

I am having issues trying to merge two datasets based on a common variable X.  Dataset A has 10 records.  5 have a value of X=1, the other obs. have X=2.

Dataset B has 3 values where x=1 and say, 3 values where x=6.  I only want records that match on X or are in Dataset A to begin with, so the 3 obs in dataset b

would be discarded.

I want to have 8 observations total, I am getting 10 by using the following code..

Data C;

    merge a (in=ina) b (in=inb);

    by x;

    if ina and inb then do;

       output c;         /* The originial record */

       charge = pmt_amt; /* Carryover the needed vars in the record from B */

       code = 'XX';      /* Mark as carryover */

       output c;

    end;

    else if ina then output c;  /* Always keep Dataset A record */


What seems to be happening is the last matching record (#3) in B gets output multiple times so I get 10 records instead of 8 in this example.  Why is

the flag inb set to 1 mutliple times on the last record in B?  I am stumped and am facing a deadline.  There must be a way to do this!

6 REPLIES 6
Amir
PROC Star

Hi,

I'm not sure I fully understand, because I read "Dataset A has 10 records" and "I only want records that match on X or are in Dataset A to begin with". As there are 10 records "in Dataset A to begin with" this implies to me the output data set would have 10 records, as opposed to 8. Perhaps you could clarify which 2 records you think you should not have.

One thing to check is that you are outputting two records for each condition "if ina and inb" via the output statement, is this part of the problem?

Regards,

Amir.

DavidJ
Calcite | Level 5

My bad.  I want the following:

1) All the records in A (10)

2) The matching records in B (3)

Desired Total 13

I am getting 15 with the last matching record in B being duplicated.  I suspect the second output statement is causing the problem but I don't know how to get around this?

PGStats
Opal | Level 21

Then you could do :

data want(drop=ok);

do until(last.x);

     set a(in=ina) b; by x;

     ok = max(ina,ok);

     if ok then output;

     end;

run;

PG

PG
Tom
Super User Tom
Super User

Sounds like you want a SET instead of MERGE.  Are the values of variables other than the key (X) that you want merged from B onto the records from A?  If not then you definitely want to SET the datasets together.  Then the issue is how to determine if the particular value of X was in dataset A.  Note that code below requires A is specified before B in the SET statement.

Data C;

  set a (in=ina) b (in=inb);

  by x;

   retain keepx ;

   if first.x then keepx=ina;

   if keepx;

run;

DavidJ
Calcite | Level 5

Tom's solution looks like what I want - thanks.

Haikuo
Onyx | Level 15

Tom's data step approach is no doubt the optimal solution, here is just to show SQL's way to stack:

proc sql noprint;

create table c as

(select * from a)

union all

(select * from b where x in (select x from a));

quit;

Haikuo

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 4151 views
  • 0 likes
  • 5 in conversation