BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I am getting confused with IN= dataset option in MERGE statement with two datasets.

 

data _NULL_;
    merge txnsfile(IN=txns) dupsfile(IN=dups);
    by key;
flle extnrpt;

 

 

Follwoing is what I want to achieve.

 

1. If the key is there in both the files then I want the record from txns file.
2. If the key is not thre in txns file but is there in dups file then I want the record from dups file.
3. If the key is there in txns file but not there in dups file then I want it from the txns file.

 

Thansk for your time.

 

Neal.

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

I'd look into the UPDATE statement rather than merge, e.g.:

 

data trans;
  input id x;
cards; 
1 0
2 0
4 0
;
run;

data master;
  input id x;
cards; 
2 1
3 1
4 1
;
run;

data want;
  update master trans;
  by id;

  put (id x)(=);
run;

 

Returns:

165  data want;
166    update master trans;
167    by id;
168
169    put (id x)(=);
170  run;

id=1 x=0
id=2 x=0
id=3 x=1
id=4 x=0
NOTE: There were 3 observations read from the data set WORK.MASTER.
NOTE: There were 3 observations read from the data set WORK.TRANS.
NOTE: The data set WORK.WANT has 4 observations and 2 variables.

 

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

9 REPLIES 9
Quentin
Super User

Merge is typically used to merge two datasets "horizontally", i.e. adding variables.

 

From your description, it sounds like you may have the same variables in both datasets, is that right?  And your goal is to find the KEY values that are in dupsfile but not in txnsfile, and add those records to txnsfile ("vertically"), is that right?

 

It would be easier to help you if you made added code to make a small sample (~5 records) of txnsfile and dupsfile, and show the output you want.

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
LinusH
Tourmaline | Level 20

If I recall right, the values from the first data set in the merge statement overrides any other values.

So by ordering the merge statement will probably be sufficient. I don't think that the IN= applies to the described logic.

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n1tgk0uanvisvon1r26l...

Data never sleeps
Quentin
Super User

@LinusH They way SAS handles variable collisions in a merge are tricky.  I think it's better to think of them as collisions to be avoided, rather than a tool to be used.  In a one-to-one merge with collisions, the variables from the value from the second dataset will overwrite the value for the first dataset.  But in a many(left)-to-one(right) merge with collisions, only the first record for a BY group from the many dataset will have its value overwritten, because when the second record is read it will overwrite the value in the PDV from the right dataset.  That's an ugly sentence.  Also you can have issues caused by colliding variables having different attributes.

 

I tend to turn on msglevel=i and treat collision notes as errors. I wish there was an option to make this an error, like MERGENOBY.

 

Here's an example of the many-to-one collision problem which is a surprise to many folks starting to work with the MERGE statement:

 

data many;
  input id x;
  cards;
1 0
2 0
2 0
3 0
;
run;

data one;
  input id x;
  cards;
2 1
3 1
;
run;

*use a merge to attempt to update the values of x for id=2 and id=3;
*does not produce the desired result;

options msglevel=i;
data want;
  merge many one;
  by id;

  put (id x)(=);
run;

 

Returns:

118  options msglevel=i;
119  data want;
120    merge many one;
121    by id;
122
123    put (id x)(=);
124  run;

INFO: The variable x on data set WORK.MANY will be overwritten by data set WORK.ONE.
id=1 x=0
id=2 x=1
id=2 x=0
id=3 x=1
NOTE: There were 4 observations read from the data set WORK.MANY.
NOTE: There were 2 observations read from the data set WORK.ONE.
NOTE: The data set WORK.WANT has 4 observations and 2 variables.

 

Note that the INFO: line in the log might actually lead you to think it's workig like you want "Yayy, the value was over-written!" but on the second record for id=2 the value of x was not overwritten.

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
saslovethemost
Quartz | Level 8

 

data many;
  input id x;
cards; 
1 0
2 0
4 0; run; data one; input id x; cards;
2 1
3 1
4 1; run;

I want the ouptut like below.

id x
1 0
2 0
3 1
4 0;


Quentin
Super User

I'd look into the UPDATE statement rather than merge, e.g.:

 

data trans;
  input id x;
cards; 
1 0
2 0
4 0
;
run;

data master;
  input id x;
cards; 
2 1
3 1
4 1
;
run;

data want;
  update master trans;
  by id;

  put (id x)(=);
run;

 

Returns:

165  data want;
166    update master trans;
167    by id;
168
169    put (id x)(=);
170  run;

id=1 x=0
id=2 x=0
id=3 x=1
id=4 x=0
NOTE: There were 3 observations read from the data set WORK.MASTER.
NOTE: There were 3 observations read from the data set WORK.TRANS.
NOTE: The data set WORK.WANT has 4 observations and 2 variables.

 

BASUG is hosting free webinars Next up: Jane Eslinger presenting PROC REPORT and the ODS EXCEL destination on Mar 27 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
saslovethemost
Quartz | Level 8

Looks good, can we achive the same using MERGE statements, thanks in advance.

Haikuo
Onyx | Level 15

You could, but be aware that you will need to deal each variables one by one if you have many variables, which is ugly coding. So why not 'update'? is this a homework question?

data trans;
  input id x;
cards; 
1 0
2 0
4 0
;
run;

data master;
  input id x;
cards; 
2 1
3 1
4 1
;
run;


DATA WANT(drop=_x);
merge TRANS (in=a) MASTER(in=b rename=x=_x);
BY ID;
IF not a and b then x=_x;
RUN;
Tom
Super User Tom
Super User

That is not a many to one merge as each ID only occurs once per source dataset.  Also it does not look like you want to do a merge at all.

 

This code wil interleave the rows by ID and select the first row for each ID.

That will produce your desired output.

data first;
  input id x;
cards; 
1 0 
2 0 
4 0
;

data second;
  input id x;
cards; 
2 1 
3 1 
4 1
;

data want ;
  set first second ;
  by id ;
  if first.id;
run;
Haikuo
Onyx | Level 15

I second @Quentin, 'Update' is the way to go. As an unorthodox alternative, you could also 'set' interlace to have the job done:

 

data trans;
  input id x;
cards; 
1 0
2 0
4 0
;
run;

data master;
  input id x;
cards; 
2 1
3 1
4 1
;
run;


DATA WANT;
SET TRANS MASTER;
BY ID;
IF FIRST.ID;
RUN;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1171 views
  • 6 likes
  • 5 in conversation