DATA Step, Macro, Functions and more

IN dataset option in MERGE two datasets.

Accepted Solution Solved
Reply
Contributor
Posts: 39
Accepted Solution

IN dataset option in MERGE two datasets.

I am getting confused with IN= dataset option in MERGE statement with two datasets.

 

data _NULL_;
    merge txnsfile(IN=txns) dupsfile(IN=dups);
    by key;
flle extnrpt;

 

 

Follwoing is what I want to achieve.

 

1. If the key is there in both the files then I want the record from txns file.
2. If the key is not thre in txns file but is there in dups file then I want the record from dups file.
3. If the key is there in txns file but not there in dups file then I want it from the txns file.

 

Thansk for your time.

 

Neal.


Accepted Solutions
Solution
‎09-21-2017 11:40 AM
PROC Star
Posts: 1,471

Re: IN dataset option in MERGE two datasets.

[ Edited ]
Posted in reply to saslovethemost

I'd look into the UPDATE statement rather than merge, e.g.:

 

data trans;
  input id x;
cards; 
1 0
2 0
4 0
;
run;

data master;
  input id x;
cards; 
2 1
3 1
4 1
;
run;

data want;
  update master trans;
  by id;

  put (id x)(=);
run;

 

Returns:

165  data want;
166    update master trans;
167    by id;
168
169    put (id x)(=);
170  run;

id=1 x=0
id=2 x=0
id=3 x=1
id=4 x=0
NOTE: There were 3 observations read from the data set WORK.MASTER.
NOTE: There were 3 observations read from the data set WORK.TRANS.
NOTE: The data set WORK.WANT has 4 observations and 2 variables.

 

View solution in original post


All Replies
PROC Star
Posts: 1,471

Re: IN dataset option in MERGE two datasets.

[ Edited ]
Posted in reply to saslovethemost

Merge is typically used to merge two datasets "horizontally", i.e. adding variables.

 

From your description, it sounds like you may have the same variables in both datasets, is that right?  And your goal is to find the KEY values that are in dupsfile but not in txnsfile, and add those records to txnsfile ("vertically"), is that right?

 

It would be easier to help you if you made added code to make a small sample (~5 records) of txnsfile and dupsfile, and show the output you want.

Super User
Posts: 5,914

Re: IN dataset option in MERGE two datasets.

Posted in reply to saslovethemost

If I recall right, the values from the first data set in the merge statement overrides any other values.

So by ordering the merge statement will probably be sufficient. I don't think that the IN= applies to the described logic.

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n1tgk0uanvisvon1r26l...

Data never sleeps
PROC Star
Posts: 1,471

Re: IN dataset option in MERGE two datasets.

@LinusH They way SAS handles variable collisions in a merge are tricky.  I think it's better to think of them as collisions to be avoided, rather than a tool to be used.  In a one-to-one merge with collisions, the variables from the value from the second dataset will overwrite the value for the first dataset.  But in a many(left)-to-one(right) merge with collisions, only the first record for a BY group from the many dataset will have its value overwritten, because when the second record is read it will overwrite the value in the PDV from the right dataset.  That's an ugly sentence.  Also you can have issues caused by colliding variables having different attributes.

 

I tend to turn on msglevel=i and treat collision notes as errors. I wish there was an option to make this an error, like MERGENOBY.

 

Here's an example of the many-to-one collision problem which is a surprise to many folks starting to work with the MERGE statement:

 

data many;
  input id x;
  cards;
1 0
2 0
2 0
3 0
;
run;

data one;
  input id x;
  cards;
2 1
3 1
;
run;

*use a merge to attempt to update the values of x for id=2 and id=3;
*does not produce the desired result;

options msglevel=i;
data want;
  merge many one;
  by id;

  put (id x)(=);
run;

 

Returns:

118  options msglevel=i;
119  data want;
120    merge many one;
121    by id;
122
123    put (id x)(=);
124  run;

INFO: The variable x on data set WORK.MANY will be overwritten by data set WORK.ONE.
id=1 x=0
id=2 x=1
id=2 x=0
id=3 x=1
NOTE: There were 4 observations read from the data set WORK.MANY.
NOTE: There were 2 observations read from the data set WORK.ONE.
NOTE: The data set WORK.WANT has 4 observations and 2 variables.

 

Note that the INFO: line in the log might actually lead you to think it's workig like you want "Yayy, the value was over-written!" but on the second record for id=2 the value of x was not overwritten.

Contributor
Posts: 39

Re: IN dataset option in MERGE two datasets.

 

data many;
  input id x;
cards; 
1 0
2 0
4 0; run; data one; input id x; cards;
2 1
3 1
4 1; run;

I want the ouptut like below.

id x
1 0
2 0
3 1
4 0;


Solution
‎09-21-2017 11:40 AM
PROC Star
Posts: 1,471

Re: IN dataset option in MERGE two datasets.

[ Edited ]
Posted in reply to saslovethemost

I'd look into the UPDATE statement rather than merge, e.g.:

 

data trans;
  input id x;
cards; 
1 0
2 0
4 0
;
run;

data master;
  input id x;
cards; 
2 1
3 1
4 1
;
run;

data want;
  update master trans;
  by id;

  put (id x)(=);
run;

 

Returns:

165  data want;
166    update master trans;
167    by id;
168
169    put (id x)(=);
170  run;

id=1 x=0
id=2 x=0
id=3 x=1
id=4 x=0
NOTE: There were 3 observations read from the data set WORK.MASTER.
NOTE: There were 3 observations read from the data set WORK.TRANS.
NOTE: The data set WORK.WANT has 4 observations and 2 variables.

 

Contributor
Posts: 39

Re: IN dataset option in MERGE two datasets.

Looks good, can we achive the same using MERGE statements, thanks in advance.

Respected Advisor
Posts: 3,184

Re: IN dataset option in MERGE two datasets.

Posted in reply to saslovethemost

You could, but be aware that you will need to deal each variables one by one if you have many variables, which is ugly coding. So why not 'update'? is this a homework question?

data trans;
  input id x;
cards; 
1 0
2 0
4 0
;
run;

data master;
  input id x;
cards; 
2 1
3 1
4 1
;
run;


DATA WANT(drop=_x);
merge TRANS (in=a) MASTER(in=b rename=x=_x);
BY ID;
IF not a and b then x=_x;
RUN;
Super User
Super User
Posts: 8,267

Re: IN dataset option in MERGE two datasets.

Posted in reply to saslovethemost

That is not a many to one merge as each ID only occurs once per source dataset.  Also it does not look like you want to do a merge at all.

 

This code wil interleave the rows by ID and select the first row for each ID.

That will produce your desired output.

data first;
  input id x;
cards; 
1 0 
2 0 
4 0
;

data second;
  input id x;
cards; 
2 1 
3 1 
4 1
;

data want ;
  set first second ;
  by id ;
  if first.id;
run;
Respected Advisor
Posts: 3,184

Re: IN dataset option in MERGE two datasets.

Posted in reply to saslovethemost

I second @Quentin, 'Update' is the way to go. As an unorthodox alternative, you could also 'set' interlace to have the job done:

 

data trans;
  input id x;
cards; 
1 0
2 0
4 0
;
run;

data master;
  input id x;
cards; 
2 1
3 1
4 1
;
run;


DATA WANT;
SET TRANS MASTER;
BY ID;
IF FIRST.ID;
RUN;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 218 views
  • 6 likes
  • 5 in conversation