SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Use merge to find different values for common variable in two data sets

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Use merge to find different values for common variable in two data sets

Hi!

 

So I have this problem (using sas universtiy):

I am supposed to merge datasets that have two common variables (id and weight) and look for values that are different for one variable (weight). 

Say:

Data sheet1;
input id weight1;
datalines;
1 33
2 44
3 55
4 66
5 77
;
run;

Data sheet2;
input id weight2;
datalines;
1 11
2 44
3 55
4 66
5 88
; 
run; 

To identify observations 1 and 5 I would run;

 

Data differences;
merge sheet1 sheet2;
by id;
run;


Data differences;
set differences;
where weight1 ne weight2;
run; 

Provided  they're propperly sorted.

 

But I am supposed to run this code:

 

 

Data differences;
merge sheet1 sheet2 (in =a);
by id weight;
if a;
run; 

And I just don't understand how that's supposed to work. The way I understand it is that the datasets are merged by id and weight for all observations in a and  overwriting values that are different with those from sheet2. 

Maybe I'm not thinking of the right data structure. My supervisor tried to explain it to me, saying that the merge statement would create missings for values that aren't equal on both datasets. 

 

Can someone fill me in? Thanks's in advance! 

 

 


Accepted Solutions
Solution
2 weeks ago
PROC Star
Posts: 7,543

Re: Use merge to find different values for common variable in two data sets

Given what you now want, your code is far too complex. To get the desired result you only need:

 

data desired;
  merge A B;
  by id ref_var;
run; 

Art, CEO, AnalystFinder.com

View solution in original post


All Replies
PROC Star
Posts: 7,543

Re: Use merge to find different values for common variable in two data sets

Since you don't have a variable called weight, that code wouldn't run. Your supervisor may have been suggesting something like:

data differences;
  merge sheet1 (rename=(weight1=weight) in=a) 
        sheet2 (rename=(weight2=weight) in=b);
  by id weight;
  if a and not b;
run; 

Art, CEO, AnalystFinder.com

 

New Contributor
Posts: 4

Re: Use merge to find different values for common variable in two data sets


Hi Art,

thanks for your reply! In my version the data sets would have weight1 and weight 2, and in the supervisor's version only weight. Sorry for the confusion! I was just concentrating on the merge question and on how a merge can return different values, only.

I was wondering about the if a and not b statement, too, but she uses only "if a", like in the example. It's a direct copy of what she did.
Trusted Advisor
Posts: 1,072

Re: Use merge to find different values for common variable in two data sets

If you have other variables exclusively in sheet1 (say x1 y1), and you performed the merge as you presented, then for cases where the BY-variables ID and WEIGHT appear only in sheet2, you would indeed have missing values assigned to X1 and Y1 in the output.  For other cases, in which ID and WEIGHT appear in both sheets, you would get the original values found in sheet1.  Perhaps that is what your supervisor is referring to.

Super User
Posts: 20,252

Re: Use merge to find different values for common variable in two data sets

Your question and the 'supervisor' answer don't appear to align. So either you're misunderstanding the question or her answer. 

 


Iona wrote:

Hi!

 

So I have this problem (using sas universtiy):

I am supposed to merge datasets that have two common variables (id and weight) and look for values that are different for one variable (weight). 

Say:

Data sheet1;
input id weight1;
datalines;
1 33
2 44
3 55
4 66
5 77
;
run;

Data sheet2;
input id weight2;
datalines;
1 11
2 44
3 55
4 66
5 88
; 
run; 

To identify observations 1 and 5 I would run;

 

Data differences;
merge sheet1 sheet2;
by id;
run;


Data differences;
set differences;
where weight1 ne weight2;
run; 

Provided  they're propperly sorted.

 

But I am supposed to run this code:

 

 

Data differences;
merge sheet1 sheet2 (in =a);
by id weight;
if a;
run; 

And I just don't understand how that's supposed to work. The way I understand it is that the datasets are merged by id and weight for all observations in a and  overwriting values that are different with those from sheet2. 

Maybe I'm not thinking of the right data structure. My supervisor tried to explain it to me, saying that the merge statement would create missings for values that aren't equal on both datasets. 

 

Can someone fill me in? Thanks's in advance! 

 

 


 

New Contributor
Posts: 4

Re: Use merge to find different values for common variable in two data sets

[ Edited ]

Thanks for your replies!

 

I think I'm halfway understanding what she wants me to do. So we have two data sets and three variables. Id has equal values in both datasets, but ref_var and value_A (value_B respectively) are different. What she wants me to do is to merge datasets A and B in a way that creates missings vor value_A and value_B where they are different.

The given datasets:

 

data A;
input id ref_var value_A;
datalines;
1 1 1 
1 2 2
2 3 3
3 1 1 
; 
run;

data B;
input id ref_var value_B;
datalines;
1 1 1
1 3 3
2 1 1
3 1 1
; 
run;

What I should have in the end:

 

data desired;
input id ref_var value_A value_B;
datalines;
1 1 1 1
1 2 2 .
1 3 . 3
2 1 . 1
2 3 3 .
3 1 1 1
; 
run; 

 

So this is the template that she gave me but I must be doing it wrong as I am only getting the missings vor value_B but not for value_A. 

 

data differences;
merge A (in =a) B;
by id ref_var;
if a;
if ref_var eq . then delete;
if value_B eq . then output;
run; 

 

 

result.PNG

 

 

Adding "if value_A eq . then output" doesn't help or even change the output.  Can someone spot my mistake? 

Solution
2 weeks ago
PROC Star
Posts: 7,543

Re: Use merge to find different values for common variable in two data sets

Given what you now want, your code is far too complex. To get the desired result you only need:

 

data desired;
  merge A B;
  by id ref_var;
run; 

Art, CEO, AnalystFinder.com

New Contributor
Posts: 4

Re: Use merge to find different values for common variable in two data sets

That did it, thank you! 

New Contributor
Posts: 3

Re: Use merge to find different values for common variable in two data sets

Data sheet1; input id weight1; datalines; 1 33 2 44 3 55 4 66 5 77 ; run; Data sheet2; input id weight2; datalines; 1 11 2 44 3 55 4 66 5 88 ; run; proc sort data=sheet1;by id; proc sort data=sheet2; by id; data merge23; merge sheet1 sheet2;by id; run;
New Contributor
Posts: 3

Re: Use merge to find different values for common variable in two data sets

Data sheet1;
input id weight1;
datalines;
1 33
2 44
3 55
4 66
5 77
;
run;

Data sheet2;
input id weight2;
datalines;
1 11
2 44
3 55
4 66
5 88
; 
run; 
proc sort data=sheet1;by id;
proc sort data=sheet2;by id;
run;

data merge_12;
merge sheet1 sheet2;by id;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 402 views
  • 1 like
  • 5 in conversation