Solved: Use merge to find different values for common variable in two data set...

Iona · Posted 12-04-2017 08:14 PM

Hi!

So I have this problem (using sas universtiy):

I am supposed to merge datasets that have two common variables (id and weight) and look for values that are different for one variable (weight).

Say:

Data sheet1;
input id weight1;
datalines;
1 33
2 44
3 55
4 66
5 77
;
run;

Data sheet2;
input id weight2;
datalines;
1 11
2 44
3 55
4 66
5 88
; 
run;

To identify observations 1 and 5 I would run;

Data differences;
merge sheet1 sheet2;
by id;
run;


Data differences;
set differences;
where weight1 ne weight2;
run;

Provided they're propperly sorted.

But I am supposed to run this code:

Data differences;
merge sheet1 sheet2 (in =a);
by id weight;
if a;
run;

And I just don't understand how that's supposed to work. The way I understand it is that the datasets are merged by id and weight for all observations in a and overwriting values that are different with those from sheet2.

Maybe I'm not thinking of the right data structure. My supervisor tried to explain it to me, saying that the merge statement would create missings for values that aren't equal on both datasets.

Can someone fill me in? Thanks's in advance!

art297 · Posted 12-05-2017 05:39 PM

Given what you now want, your code is far too complex. To get the desired result you only need:

data desired;
  merge A B;
  by id ref_var;
run;

Art, CEO, AnalystFinder.com

View solution in original post

art297 · Posted 12-04-2017 09:06 PM

Since you don't have a variable called weight, that code wouldn't run. Your supervisor may have been suggesting something like:

data differences;
  merge sheet1 (rename=(weight1=weight) in=a) 
        sheet2 (rename=(weight2=weight) in=b);
  by id weight;
  if a and not b;
run;

Art, CEO, AnalystFinder.com

Iona · Posted 12-04-2017 09:19 PM

Hi Art,

thanks for your reply! In my version the data sets would have weight1 and weight 2, and in the supervisor's version only weight. Sorry for the confusion! I was just concentrating on the merge question and on how a merge can return different values, only.

I was wondering about the if a and not b statement, too, but she uses only "if a", like in the example. It's a direct copy of what she did.

mkeintz · Posted 12-04-2017 09:30 PM

If you have other variables exclusively in sheet1 (say x1 y1), and you performed the merge as you presented, then for cases where the BY-variables ID and WEIGHT appear only in sheet2, you would indeed have missing values assigned to X1 and Y1 in the output. For other cases, in which ID and WEIGHT appear in both sheets, you would get the original values found in sheet1. Perhaps that is what your supervisor is referring to.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Reeza · Posted 12-04-2017 10:45 PM

Your question and the 'supervisor' answer don't appear to align. So either you're misunderstanding the question or her answer.

@Iona wrote:

Hi!

So I have this problem (using sas universtiy):

I am supposed to merge datasets that have two common variables (id and weight) and look for values that are different for one variable (weight).

Say:
Data sheet1;
input id weight1;
datalines;
1 33
2 44
3 55
4 66
5 77
;
run;

Data sheet2;
input id weight2;
datalines;
1 11
2 44
3 55
4 66
5 88
; 
run; 
To identify observations 1 and 5 I would run;
Data differences;
merge sheet1 sheet2;
by id;
run;


Data differences;
set differences;
where weight1 ne weight2;
run; 
Provided they're propperly sorted.

But I am supposed to run this code:
Data differences;
merge sheet1 sheet2 (in =a);
by id weight;
if a;
run; 
And I just don't understand how that's supposed to work. The way I understand it is that the datasets are merged by id and weight for all observations in a and overwriting values that are different with those from sheet2.

Maybe I'm not thinking of the right data structure. My supervisor tried to explain it to me, saying that the merge statement would create missings for values that aren't equal on both datasets.

Can someone fill me in? Thanks's in advance!

Iona · Posted 12-05-2017 05:25 PM

Thanks for your replies!

I think I'm halfway understanding what she wants me to do. So we have two data sets and three variables. Id has equal values in both datasets, but ref_var and value_A (value_B respectively) are different. What she wants me to do is to merge datasets A and B in a way that creates missings vor value_A and value_B where they are different.

The given datasets:

data A;
input id ref_var value_A;
datalines;
1 1 1 
1 2 2
2 3 3
3 1 1 
; 
run;

data B;
input id ref_var value_B;
datalines;
1 1 1
1 3 3
2 1 1
3 1 1
; 
run;

What I should have in the end:

data desired;
input id ref_var value_A value_B;
datalines;
1 1 1 1
1 2 2 .
1 3 . 3
2 1 . 1
2 3 3 .
3 1 1 1
; 
run;

So this is the template that she gave me but I must be doing it wrong as I am only getting the missings vor value_B but not for value_A.

data differences;
merge A (in =a) B;
by id ref_var;
if a;
if ref_var eq . then delete;
if value_B eq . then output;
run;

Adding "if value_A eq . then output" doesn't help or even change the output. Can someone spot my mistake?

art297 · Posted 12-05-2017 05:39 PM

Given what you now want, your code is far too complex. To get the desired result you only need:

data desired;
  merge A B;
  by id ref_var;
run;

Art, CEO, AnalystFinder.com

Iona · Posted 12-06-2017 05:07 PM

That did it, thank you!

rvsidhu035 · Posted 12-10-2017 07:54 AM

Data sheet1; input id weight1; datalines; 1 33 2 44 3 55 4 66 5 77 ; run; Data sheet2; input id weight2; datalines; 1 11 2 44 3 55 4 66 5 88 ; run; proc sort data=sheet1;by id; proc sort data=sheet2; by id; data merge23; merge sheet1 sheet2;by id; run;

rvsidhu035 · Posted 12-14-2017 10:54 AM

Data sheet1;
input id weight1;
datalines;
1 33
2 44
3 55
4 66
5 77
;
run;

Data sheet2;
input id weight2;
datalines;
1 11
2 44
3 55
4 66
5 88
; 
run; 
proc sort data=sheet1;by id;
proc sort data=sheet2;by id;
run;

data merge_12;
merge sheet1 sheet2;by id;
run;

Use merge to find different values for common variable in two data sets

Re: Use merge to find different values for common variable in two data sets

Re: Use merge to find different values for common variable in two data sets

Re: Use merge to find different values for common variable in two data sets

Re: Use merge to find different values for common variable in two data sets

Re: Use merge to find different values for common variable in two data sets

Re: Use merge to find different values for common variable in two data sets

Re: Use merge to find different values for common variable in two data sets

Re: Use merge to find different values for common variable in two data sets

Re: Use merge to find different values for common variable in two data sets

Re: Use merge to find different values for common variable in two data sets

Registration is open