BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shynu
Fluorite | Level 6

data a;
input m p c;
cards;
102 113 114
102 124 134
103 114 115
103 124 135
104 115 116
104 125 136
run;

data b;
input m p c;
cards;
102 103 104
103 104 105
104 105 106
run;

data e1;
merge a b;
by m;
run;
proc print data=e;
run;

OUTPUT:

Obs m p c
1 102 103 104
2 102 124 134
3 103 104 105
4 103 124 135
5 104 105 106
6 104 125 136

 

data a;
input m p c;
cards;
102 103 104
103 104 105
104 105 106
run;

data b;
input m p c ;
cards;
102 113 114
102 103 104
102 124 134
103 114 115
103 104 105
103 124 135
104 115 116
104 105 106
104 125 136
run;

data e2;
merge a b;
by m;
run;
proc print data=e;
run;

OUTPUT:

Obs m p c
1 102 113 114
2 102 103 104
3 102 124 134
4 103 114 115
5 103 104 105
6 103 124 135
7 104 115 116
8 104 105 106
9 104 125 136

Data A;
Input ID Name$ Height;
cards;
1 A 1
3 B 2
5 C 2
7 D 2
9 E 2
;
run;

Data B;
Input ID Name$ Weight;
cards;
2 A 2
4 B 3
5 C 4
7 D 5
;
run;

Data dummy;
Merge A B;
by ID;
run;
proc print data = dummy;
run;

OUTPUT:

Obs ID Name Height Weight
1 1 A 1 .
2 2 A . 2
3 3 B 2 .
4 4 B . 3
5 5 C 2 4
6 7 D 2 5
7 9 E 2 .

 

I have used a simple merge condition in all the 3 codes, but very confused after seeing the ouput.. Can anyone please clarify my confusion?

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If you just walk through the steps involved in merging you can easily see why you are getting the output you posted.

Let's take the case where M=102. 

data a;
  input m p c;
cards;
102 113 114
102 124 134
;
data b;
  input m p c;
cards;
102 103 104
;
data want ;
  merge a b;
  by m;
run;

So on the first iteration (_N_=1) the MERGE statement will find a record from both A and B.  

So it first loads the record from A and sets the variables to these values

m=102 p=113 c=114

It then loads the record from B and doing this will replace the values of P and C (I guess it might also replace M but since it is the same value it doesn't change anything).  

This will result in 

m=102 p=103 c=104

We then reach the end and this is output and the next iteration _n_=2 starts. Now this time there are no more records from B to load so we only load the new recrod from A.  So we get 

m=102 p=124 c=134

which is then output at the end of the data step.

 

In general it is a bad idea to merge datasets that have common variables that are not part of the BY variables.

 

 

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

We don't know what is your confusion. You haven't explained what you are confused about. Give us details.

--
Paige Miller
shynu
Fluorite | Level 6

when this type of merging is done we need to get all the obs from both the datasets right? But why is the output varying in all the cases?

Kurt_Bremser
Super User

Your third example is impossible. None of your input datasets contains ID 6.

Examples 1 & 2 perfectly document the behaviour of datasteps with a by, no conditions, and identically named variables in the input datasets. With identical variables, the sequence of reading the input datasets has an impact on what will end up in the output.

Having identical variables in a data step merge is therefore (usually) a BAD IDEA.

shynu
Fluorite | Level 6

but there is no 6 in ID in output dataset too....the first column which has 6 is the obs number 

Tom
Super User Tom
Super User

If you just walk through the steps involved in merging you can easily see why you are getting the output you posted.

Let's take the case where M=102. 

data a;
  input m p c;
cards;
102 113 114
102 124 134
;
data b;
  input m p c;
cards;
102 103 104
;
data want ;
  merge a b;
  by m;
run;

So on the first iteration (_N_=1) the MERGE statement will find a record from both A and B.  

So it first loads the record from A and sets the variables to these values

m=102 p=113 c=114

It then loads the record from B and doing this will replace the values of P and C (I guess it might also replace M but since it is the same value it doesn't change anything).  

This will result in 

m=102 p=103 c=104

We then reach the end and this is output and the next iteration _n_=2 starts. Now this time there are no more records from B to load so we only load the new recrod from A.  So we get 

m=102 p=124 c=134

which is then output at the end of the data step.

 

In general it is a bad idea to merge datasets that have common variables that are not part of the BY variables.

 

 

shynu
Fluorite | Level 6

thank you  ... Now I understood how it works 🙂

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1111 views
  • 2 likes
  • 4 in conversation