I am getting confused with IN= dataset option in MERGE statement with two datasets.
data _NULL_; merge txnsfile(IN=txns) dupsfile(IN=dups); by key;
flle extnrpt;
Follwoing is what I want to achieve.
1. If the key is there in both the files then I want the record from txns file.
2. If the key is not thre in txns file but is there in dups file then I want the record from dups file.
3. If the key is there in txns file but not there in dups file then I want it from the txns file.
Thansk for your time.
Neal.
I'd look into the UPDATE statement rather than merge, e.g.:
data trans;
input id x;
cards;
1 0
2 0
4 0
;
run;
data master;
input id x;
cards;
2 1
3 1
4 1
;
run;
data want;
update master trans;
by id;
put (id x)(=);
run;
Returns:
165 data want; 166 update master trans; 167 by id; 168 169 put (id x)(=); 170 run; id=1 x=0 id=2 x=0 id=3 x=1 id=4 x=0 NOTE: There were 3 observations read from the data set WORK.MASTER. NOTE: There were 3 observations read from the data set WORK.TRANS. NOTE: The data set WORK.WANT has 4 observations and 2 variables.
Merge is typically used to merge two datasets "horizontally", i.e. adding variables.
From your description, it sounds like you may have the same variables in both datasets, is that right? And your goal is to find the KEY values that are in dupsfile but not in txnsfile, and add those records to txnsfile ("vertically"), is that right?
It would be easier to help you if you made added code to make a small sample (~5 records) of txnsfile and dupsfile, and show the output you want.
If I recall right, the values from the first data set in the merge statement overrides any other values.
So by ordering the merge statement will probably be sufficient. I don't think that the IN= applies to the described logic.
@LinusH They way SAS handles variable collisions in a merge are tricky. I think it's better to think of them as collisions to be avoided, rather than a tool to be used. In a one-to-one merge with collisions, the variables from the value from the second dataset will overwrite the value for the first dataset. But in a many(left)-to-one(right) merge with collisions, only the first record for a BY group from the many dataset will have its value overwritten, because when the second record is read it will overwrite the value in the PDV from the right dataset. That's an ugly sentence. Also you can have issues caused by colliding variables having different attributes.
I tend to turn on msglevel=i and treat collision notes as errors. I wish there was an option to make this an error, like MERGENOBY.
Here's an example of the many-to-one collision problem which is a surprise to many folks starting to work with the MERGE statement:
data many;
input id x;
cards;
1 0
2 0
2 0
3 0
;
run;
data one;
input id x;
cards;
2 1
3 1
;
run;
*use a merge to attempt to update the values of x for id=2 and id=3;
*does not produce the desired result;
options msglevel=i;
data want;
merge many one;
by id;
put (id x)(=);
run;
Returns:
118 options msglevel=i;
119 data want;
120 merge many one;
121 by id;
122
123 put (id x)(=);
124 run;
INFO: The variable x on data set WORK.MANY will be overwritten by data set WORK.ONE.
id=1 x=0
id=2 x=1
id=2 x=0
id=3 x=1
NOTE: There were 4 observations read from the data set WORK.MANY.
NOTE: There were 2 observations read from the data set WORK.ONE.
NOTE: The data set WORK.WANT has 4 observations and 2 variables.
Note that the INFO: line in the log might actually lead you to think it's workig like you want "Yayy, the value was over-written!" but on the second record for id=2 the value of x was not overwritten.
data many; input id x; cards;
1 0
2 0
4 0; run; data one; input id x; cards;
2 1
3 1
4 1; run;
I want the ouptut like below.
id x
1 0
2 0
3 1
4 0;
I'd look into the UPDATE statement rather than merge, e.g.:
data trans;
input id x;
cards;
1 0
2 0
4 0
;
run;
data master;
input id x;
cards;
2 1
3 1
4 1
;
run;
data want;
update master trans;
by id;
put (id x)(=);
run;
Returns:
165 data want; 166 update master trans; 167 by id; 168 169 put (id x)(=); 170 run; id=1 x=0 id=2 x=0 id=3 x=1 id=4 x=0 NOTE: There were 3 observations read from the data set WORK.MASTER. NOTE: There were 3 observations read from the data set WORK.TRANS. NOTE: The data set WORK.WANT has 4 observations and 2 variables.
Looks good, can we achive the same using MERGE statements, thanks in advance.
You could, but be aware that you will need to deal each variables one by one if you have many variables, which is ugly coding. So why not 'update'? is this a homework question?
data trans;
input id x;
cards;
1 0
2 0
4 0
;
run;
data master;
input id x;
cards;
2 1
3 1
4 1
;
run;
DATA WANT(drop=_x);
merge TRANS (in=a) MASTER(in=b rename=x=_x);
BY ID;
IF not a and b then x=_x;
RUN;
That is not a many to one merge as each ID only occurs once per source dataset. Also it does not look like you want to do a merge at all.
This code wil interleave the rows by ID and select the first row for each ID.
That will produce your desired output.
data first;
input id x;
cards;
1 0
2 0
4 0
;
data second;
input id x;
cards;
2 1
3 1
4 1
;
data want ;
set first second ;
by id ;
if first.id;
run;
I second @Quentin, 'Update' is the way to go. As an unorthodox alternative, you could also 'set' interlace to have the job done:
data trans;
input id x;
cards;
1 0
2 0
4 0
;
run;
data master;
input id x;
cards;
2 1
3 1
4 1
;
run;
DATA WANT;
SET TRANS MASTER;
BY ID;
IF FIRST.ID;
RUN;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.