BookmarkSubscribeRSS Feed
DavidJ
Calcite | Level 5

I am having issues trying to merge two datasets based on a common variable X.  Dataset A has 10 records.  5 have a value of X=1, the other obs. have X=2.

Dataset B has 3 values where x=1 and say, 3 values where x=6.  I only want records that match on X or are in Dataset A to begin with, so the 3 obs in dataset b

would be discarded.

I want to have 8 observations total, I am getting 10 by using the following code..

Data C;

    merge a (in=ina) b (in=inb);

    by x;

    if ina and inb then do;

       output c;         /* The originial record */

       charge = pmt_amt; /* Carryover the needed vars in the record from B */

       code = 'XX';      /* Mark as carryover */

       output c;

    end;

    else if ina then output c;  /* Always keep Dataset A record */


What seems to be happening is the last matching record (#3) in B gets output multiple times so I get 10 records instead of 8 in this example.  Why is

the flag inb set to 1 mutliple times on the last record in B?  I am stumped and am facing a deadline.  There must be a way to do this!

6 REPLIES 6
Amir
PROC Star

Hi,

I'm not sure I fully understand, because I read "Dataset A has 10 records" and "I only want records that match on X or are in Dataset A to begin with". As there are 10 records "in Dataset A to begin with" this implies to me the output data set would have 10 records, as opposed to 8. Perhaps you could clarify which 2 records you think you should not have.

One thing to check is that you are outputting two records for each condition "if ina and inb" via the output statement, is this part of the problem?

Regards,

Amir.

DavidJ
Calcite | Level 5

My bad.  I want the following:

1) All the records in A (10)

2) The matching records in B (3)

Desired Total 13

I am getting 15 with the last matching record in B being duplicated.  I suspect the second output statement is causing the problem but I don't know how to get around this?

PGStats
Opal | Level 21

Then you could do :

data want(drop=ok);

do until(last.x);

     set a(in=ina) b; by x;

     ok = max(ina,ok);

     if ok then output;

     end;

run;

PG

PG
Tom
Super User Tom
Super User

Sounds like you want a SET instead of MERGE.  Are the values of variables other than the key (X) that you want merged from B onto the records from A?  If not then you definitely want to SET the datasets together.  Then the issue is how to determine if the particular value of X was in dataset A.  Note that code below requires A is specified before B in the SET statement.

Data C;

  set a (in=ina) b (in=inb);

  by x;

   retain keepx ;

   if first.x then keepx=ina;

   if keepx;

run;

DavidJ
Calcite | Level 5

Tom's solution looks like what I want - thanks.

Haikuo
Onyx | Level 15

Tom's data step approach is no doubt the optimal solution, here is just to show SQL's way to stack:

proc sql noprint;

create table c as

(select * from a)

union all

(select * from b where x in (select x from a));

quit;

Haikuo

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2687 views
  • 0 likes
  • 5 in conversation