BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Iona
Fluorite | Level 6

Hi!

 

So I have this problem (using sas universtiy):

I am supposed to merge datasets that have two common variables (id and weight) and look for values that are different for one variable (weight). 

Say:

Data sheet1;
input id weight1;
datalines;
1 33
2 44
3 55
4 66
5 77
;
run;

Data sheet2;
input id weight2;
datalines;
1 11
2 44
3 55
4 66
5 88
; 
run; 

To identify observations 1 and 5 I would run;

 

Data differences;
merge sheet1 sheet2;
by id;
run;


Data differences;
set differences;
where weight1 ne weight2;
run; 

Provided  they're propperly sorted.

 

But I am supposed to run this code:

 

 

Data differences;
merge sheet1 sheet2 (in =a);
by id weight;
if a;
run; 

And I just don't understand how that's supposed to work. The way I understand it is that the datasets are merged by id and weight for all observations in a and  overwriting values that are different with those from sheet2. 

Maybe I'm not thinking of the right data structure. My supervisor tried to explain it to me, saying that the merge statement would create missings for values that aren't equal on both datasets. 

 

Can someone fill me in? Thanks's in advance! 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Given what you now want, your code is far too complex. To get the desired result you only need:

 

data desired;
  merge A B;
  by id ref_var;
run; 

Art, CEO, AnalystFinder.com

View solution in original post

9 REPLIES 9
art297
Opal | Level 21

Since you don't have a variable called weight, that code wouldn't run. Your supervisor may have been suggesting something like:

data differences;
  merge sheet1 (rename=(weight1=weight) in=a) 
        sheet2 (rename=(weight2=weight) in=b);
  by id weight;
  if a and not b;
run; 

Art, CEO, AnalystFinder.com

 

Iona
Fluorite | Level 6

Hi Art,

thanks for your reply! In my version the data sets would have weight1 and weight 2, and in the supervisor's version only weight. Sorry for the confusion! I was just concentrating on the merge question and on how a merge can return different values, only.

I was wondering about the if a and not b statement, too, but she uses only "if a", like in the example. It's a direct copy of what she did.
mkeintz
PROC Star

If you have other variables exclusively in sheet1 (say x1 y1), and you performed the merge as you presented, then for cases where the BY-variables ID and WEIGHT appear only in sheet2, you would indeed have missing values assigned to X1 and Y1 in the output.  For other cases, in which ID and WEIGHT appear in both sheets, you would get the original values found in sheet1.  Perhaps that is what your supervisor is referring to.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Reeza
Super User

Your question and the 'supervisor' answer don't appear to align. So either you're misunderstanding the question or her answer. 

 


@Iona wrote:

Hi!

 

So I have this problem (using sas universtiy):

I am supposed to merge datasets that have two common variables (id and weight) and look for values that are different for one variable (weight). 

Say:

Data sheet1;
input id weight1;
datalines;
1 33
2 44
3 55
4 66
5 77
;
run;

Data sheet2;
input id weight2;
datalines;
1 11
2 44
3 55
4 66
5 88
; 
run; 

To identify observations 1 and 5 I would run;

 

Data differences;
merge sheet1 sheet2;
by id;
run;


Data differences;
set differences;
where weight1 ne weight2;
run; 

Provided  they're propperly sorted.

 

But I am supposed to run this code:

 

 

Data differences;
merge sheet1 sheet2 (in =a);
by id weight;
if a;
run; 

And I just don't understand how that's supposed to work. The way I understand it is that the datasets are merged by id and weight for all observations in a and  overwriting values that are different with those from sheet2. 

Maybe I'm not thinking of the right data structure. My supervisor tried to explain it to me, saying that the merge statement would create missings for values that aren't equal on both datasets. 

 

Can someone fill me in? Thanks's in advance! 

 

 


 

Iona
Fluorite | Level 6

Thanks for your replies!

 

I think I'm halfway understanding what she wants me to do. So we have two data sets and three variables. Id has equal values in both datasets, but ref_var and value_A (value_B respectively) are different. What she wants me to do is to merge datasets A and B in a way that creates missings vor value_A and value_B where they are different.

The given datasets:

 

data A;
input id ref_var value_A;
datalines;
1 1 1 
1 2 2
2 3 3
3 1 1 
; 
run;

data B;
input id ref_var value_B;
datalines;
1 1 1
1 3 3
2 1 1
3 1 1
; 
run;

What I should have in the end:

 

data desired;
input id ref_var value_A value_B;
datalines;
1 1 1 1
1 2 2 .
1 3 . 3
2 1 . 1
2 3 3 .
3 1 1 1
; 
run; 

 

So this is the template that she gave me but I must be doing it wrong as I am only getting the missings vor value_B but not for value_A. 

 

data differences;
merge A (in =a) B;
by id ref_var;
if a;
if ref_var eq . then delete;
if value_B eq . then output;
run; 

 

 

result.PNG

 

 

Adding "if value_A eq . then output" doesn't help or even change the output.  Can someone spot my mistake? 

art297
Opal | Level 21

Given what you now want, your code is far too complex. To get the desired result you only need:

 

data desired;
  merge A B;
  by id ref_var;
run; 

Art, CEO, AnalystFinder.com

Iona
Fluorite | Level 6

That did it, thank you! 

rvsidhu035
Quartz | Level 8
Data sheet1; input id weight1; datalines; 1 33 2 44 3 55 4 66 5 77 ; run; Data sheet2; input id weight2; datalines; 1 11 2 44 3 55 4 66 5 88 ; run; proc sort data=sheet1;by id; proc sort data=sheet2; by id; data merge23; merge sheet1 sheet2;by id; run;
rvsidhu035
Quartz | Level 8
Data sheet1;
input id weight1;
datalines;
1 33
2 44
3 55
4 66
5 77
;
run;

Data sheet2;
input id weight2;
datalines;
1 11
2 44
3 55
4 66
5 88
; 
run; 
proc sort data=sheet1;by id;
proc sort data=sheet2;by id;
run;

data merge_12;
merge sheet1 sheet2;by id;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2592 views
  • 1 like
  • 5 in conversation