Solved: IN dataset option in MERGE two datasets.

saslovethemost · Posted 09-15-2017 08:46 AM

I am getting confused with IN= dataset option in MERGE statement with two datasets.

data _NULL_;
    merge txnsfile(IN=txns) dupsfile(IN=dups);
    by key;
    flle extnrpt;

Follwoing is what I want to achieve.

1. If the key is there in both the files then I want the record from txns file.
2. If the key is not thre in txns file but is there in dups file then I want the record from dups file.
3. If the key is there in txns file but not there in dups file then I want it from the txns file.

Thansk for your time.

Neal.

Quentin · Posted 09-15-2017 11:12 AM

I'd look into the UPDATE statement rather than merge, e.g.:

data trans;
  input id x;
cards; 
1 0
2 0
4 0
;
run;

data master;
  input id x;
cards; 
2 1
3 1
4 1
;
run;

data want;
  update master trans;
  by id;

  put (id x)(=);
run;

Returns:

165  data want;
166    update master trans;
167    by id;
168
169    put (id x)(=);
170  run;

id=1 x=0
id=2 x=0
id=3 x=1
id=4 x=0
NOTE: There were 3 observations read from the data set WORK.MASTER.
NOTE: There were 3 observations read from the data set WORK.TRANS.
NOTE: The data set WORK.WANT has 4 observations and 2 variables.

The Boston Area SAS Users Group is hosting free webinars!

Register now at https://www.basug.org/events.

View solution in original post

Quentin · Posted 09-15-2017 09:08 AM

Merge is typically used to merge two datasets "horizontally", i.e. adding variables.

From your description, it sounds like you may have the same variables in both datasets, is that right? And your goal is to find the KEY values that are in dupsfile but not in txnsfile, and add those records to txnsfile ("vertically"), is that right?

It would be easier to help you if you made added code to make a small sample (~5 records) of txnsfile and dupsfile, and show the output you want.

The Boston Area SAS Users Group is hosting free webinars!

Register now at https://www.basug.org/events.

LinusH · Posted 09-15-2017 09:52 AM

If I recall right, the values from the first data set in the merge statement overrides any other values.

So by ordering the merge statement will probably be sufficient. I don't think that the IN= applies to the described logic.

http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n1tgk0uanvisvon1r26l...

Data never sleeps

Quentin · Posted 09-15-2017 10:17 AM

@LinusH They way SAS handles variable collisions in a merge are tricky. I think it's better to think of them as collisions to be avoided, rather than a tool to be used. In a one-to-one merge with collisions, the variables from the value from the second dataset will overwrite the value for the first dataset. But in a many(left)-to-one(right) merge with collisions, only the first record for a BY group from the many dataset will have its value overwritten, because when the second record is read it will overwrite the value in the PDV from the right dataset. That's an ugly sentence. Also you can have issues caused by colliding variables having different attributes.

I tend to turn on msglevel=i and treat collision notes as errors. I wish there was an option to make this an error, like MERGENOBY.

Here's an example of the many-to-one collision problem which is a surprise to many folks starting to work with the MERGE statement:

data many;
  input id x;
  cards;
1 0
2 0
2 0
3 0
;
run;

data one;
  input id x;
  cards;
2 1
3 1
;
run;

*use a merge to attempt to update the values of x for id=2 and id=3;
*does not produce the desired result;

options msglevel=i;
data want;
  merge many one;
  by id;

  put (id x)(=);
run;

Returns:

118  options msglevel=i;
119  data want;
120    merge many one;
121    by id;
122
123    put (id x)(=);
124  run;

INFO: The variable x on data set WORK.MANY will be overwritten by data set WORK.ONE.
id=1 x=0
id=2 x=1
id=2 x=0
id=3 x=1
NOTE: There were 4 observations read from the data set WORK.MANY.
NOTE: There were 2 observations read from the data set WORK.ONE.
NOTE: The data set WORK.WANT has 4 observations and 2 variables.

Note that the INFO: line in the log might actually lead you to think it's workig like you want "Yayy, the value was over-written!" but on the second record for id=2 the value of x was not overwritten.

The Boston Area SAS Users Group is hosting free webinars!

Register now at https://www.basug.org/events.

saslovethemost · Posted 09-15-2017 11:01 AM

data many;
  input id x;
cards; 
1 0 
2 0 
4 0;
run;

data one;
  input id x;
cards; 
2 1 
3 1 
4 1;
run;

I want the ouptut like below.

id x
1  0
2  0
3  1
4  0;

Quentin · Posted 09-15-2017 11:12 AM

I'd look into the UPDATE statement rather than merge, e.g.:

data trans;
  input id x;
cards; 
1 0
2 0
4 0
;
run;

data master;
  input id x;
cards; 
2 1
3 1
4 1
;
run;

data want;
  update master trans;
  by id;

  put (id x)(=);
run;

Returns:

165  data want;
166    update master trans;
167    by id;
168
169    put (id x)(=);
170  run;

id=1 x=0
id=2 x=0
id=3 x=1
id=4 x=0
NOTE: There were 3 observations read from the data set WORK.MASTER.
NOTE: There were 3 observations read from the data set WORK.TRANS.
NOTE: The data set WORK.WANT has 4 observations and 2 variables.

The Boston Area SAS Users Group is hosting free webinars!

Register now at https://www.basug.org/events.

saslovethemost · Posted 09-15-2017 11:37 AM

Looks good, can we achive the same using MERGE statements, thanks in advance.

Haikuo · Posted 09-15-2017 11:44 AM

You could, but be aware that you will need to deal each variables one by one if you have many variables, which is ugly coding. So why not 'update'? is this a homework question?

data trans;
  input id x;
cards; 
1 0
2 0
4 0
;
run;

data master;
  input id x;
cards; 
2 1
3 1
4 1
;
run;


DATA WANT(drop=_x);
merge TRANS (in=a) MASTER(in=b rename=x=_x);
BY ID;
IF not a and b then x=_x;
RUN;

Tom · Posted 09-15-2017 11:38 AM

That is not a many to one merge as each ID only occurs once per source dataset. Also it does not look like you want to do a merge at all.

This code wil interleave the rows by ID and select the first row for each ID.

That will produce your desired output.

data first;
  input id x;
cards; 
1 0 
2 0 
4 0
;

data second;
  input id x;
cards; 
2 1 
3 1 
4 1
;

data want ;
  set first second ;
  by id ;
  if first.id;
run;

Haikuo · Posted 09-15-2017 11:36 AM

I second @Quentin, 'Update' is the way to go. As an unorthodox alternative, you could also 'set' interlace to have the job done:

data trans;
  input id x;
cards; 
1 0
2 0
4 0
;
run;

data master;
  input id x;
cards; 
2 1
3 1
4 1
;
run;


DATA WANT;
SET TRANS MASTER;
BY ID;
IF FIRST.ID;
RUN;

IN dataset option in MERGE two datasets.

Re: IN dataset option in MERGE two datasets.

Re: IN dataset option in MERGE two datasets.

Re: IN dataset option in MERGE two datasets.

Re: IN dataset option in MERGE two datasets.

Re: IN dataset option in MERGE two datasets.

Re: IN dataset option in MERGE two datasets.

Re: IN dataset option in MERGE two datasets.

Re: IN dataset option in MERGE two datasets.

Re: IN dataset option in MERGE two datasets.

Re: IN dataset option in MERGE two datasets.

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away