DATA Step, Macro, Functions and more

Merging

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 11
Accepted Solution

Merging

data a;
input m p c;
cards;
102 113 114
102 124 134
103 114 115
103 124 135
104 115 116
104 125 136
run;

data b;
input m p c;
cards;
102 103 104
103 104 105
104 105 106
run;

data e1;
merge a b;
by m;
run;
proc print data=e;
run;

OUTPUT:

Obs m p c
1 102 103 104
2 102 124 134
3 103 104 105
4 103 124 135
5 104 105 106
6 104 125 136

 

data a;
input m p c;
cards;
102 103 104
103 104 105
104 105 106
run;

data b;
input m p c ;
cards;
102 113 114
102 103 104
102 124 134
103 114 115
103 104 105
103 124 135
104 115 116
104 105 106
104 125 136
run;

data e2;
merge a b;
by m;
run;
proc print data=e;
run;

OUTPUT:

Obs m p c
1 102 113 114
2 102 103 104
3 102 124 134
4 103 114 115
5 103 104 105
6 103 124 135
7 104 115 116
8 104 105 106
9 104 125 136

Data A;
Input ID Name$ Height;
cards;
1 A 1
3 B 2
5 C 2
7 D 2
9 E 2
;
run;

Data B;
Input ID Name$ Weight;
cards;
2 A 2
4 B 3
5 C 4
7 D 5
;
run;

Data dummy;
Merge A B;
by ID;
run;
proc print data = dummy;
run;

OUTPUT:

Obs ID Name Height Weight
1 1 A 1 .
2 2 A . 2
3 3 B 2 .
4 4 B . 3
5 5 C 2 4
6 7 D 2 5
7 9 E 2 .

 

I have used a simple merge condition in all the 3 codes, but very confused after seeing the ouput.. Can anyone please clarify my confusion?


Accepted Solutions
Solution
‎06-17-2017 12:13 PM
Super User
Super User
Posts: 6,502

Re: Merging

[ Edited ]

If you just walk through the steps involved in merging you can easily see why you are getting the output you posted.

Let's take the case where M=102. 

data a;
  input m p c;
cards;
102 113 114
102 124 134
;
data b;
  input m p c;
cards;
102 103 104
;
data want ;
  merge a b;
  by m;
run;

So on the first iteration (_N_=1) the MERGE statement will find a record from both A and B.  

So it first loads the record from A and sets the variables to these values

m=102 p=113 c=114

It then loads the record from B and doing this will replace the values of P and C (I guess it might also replace M but since it is the same value it doesn't change anything).  

This will result in 

m=102 p=103 c=104

We then reach the end and this is output and the next iteration _n_=2 starts. Now this time there are no more records from B to load so we only load the new recrod from A.  So we get 

m=102 p=124 c=134

which is then output at the end of the data step.

 

In general it is a bad idea to merge datasets that have common variables that are not part of the BY variables.

 

 

View solution in original post


All Replies
Trusted Advisor
Posts: 1,615

Re: Merging

We don't know what is your confusion. You haven't explained what you are confused about. Give us details.

Occasional Contributor
Posts: 11

Re: Merging

when this type of merging is done we need to get all the obs from both the datasets right? But why is the output varying in all the cases?

Super User
Posts: 6,942

Re: Merging

Your third example is impossible. None of your input datasets contains ID 6.

Examples 1 & 2 perfectly document the behaviour of datasteps with a by, no conditions, and identically named variables in the input datasets. With identical variables, the sequence of reading the input datasets has an impact on what will end up in the output.

Having identical variables in a data step merge is therefore (usually) a BAD IDEA.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 11

Re: Merging

but there is no 6 in ID in output dataset too....the first column which has 6 is the obs number 

Solution
‎06-17-2017 12:13 PM
Super User
Super User
Posts: 6,502

Re: Merging

[ Edited ]

If you just walk through the steps involved in merging you can easily see why you are getting the output you posted.

Let's take the case where M=102. 

data a;
  input m p c;
cards;
102 113 114
102 124 134
;
data b;
  input m p c;
cards;
102 103 104
;
data want ;
  merge a b;
  by m;
run;

So on the first iteration (_N_=1) the MERGE statement will find a record from both A and B.  

So it first loads the record from A and sets the variables to these values

m=102 p=113 c=114

It then loads the record from B and doing this will replace the values of P and C (I guess it might also replace M but since it is the same value it doesn't change anything).  

This will result in 

m=102 p=103 c=104

We then reach the end and this is output and the next iteration _n_=2 starts. Now this time there are no more records from B to load so we only load the new recrod from A.  So we get 

m=102 p=124 c=134

which is then output at the end of the data step.

 

In general it is a bad idea to merge datasets that have common variables that are not part of the BY variables.

 

 

Occasional Contributor
Posts: 11

Re: Merging

thank you  ... Now I understood how it works Smiley Happy

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 217 views
  • 2 likes
  • 4 in conversation