BookmarkSubscribeRSS Feed
CathyVI
Pyrite | Level 9

 I will like to merge a two data set. Data19 contain everyone i wanted. Data18 contains some people in data19 and some who are not in data19. I want to merge the data so I can get all variables in data19 plus only the people in data18 who are not in data19.

For example:

data a19;
input bene_id$ A B C D;
datalines;
A1 . . 09 10
A2 . . . 12
A3 14 . . .
;run;

data a18;
input bene_id$ A B C D;
datalines;
A1 . . 09 .
A4 09 . . 12
A2 . 5 . .
A5 . . . 13
;run;

proc sort data=a19; by bene_id;run;
proc sort data=a18; by bene_id;run;

 

Data dummy;
Merge a19 (IN = X) a18 (IN=Y);
by bene_id;
If (X=1 or y=1);
run;

When I used the merge code above observation A1 did not have D=10, and A2 did not have B=12. What I want is to capture the entire a19 + a18 (with only observations not in a19).

 

This is what I want:

bene_id$ A B C D

A1 . . 09 10
A2 . . . 12
A3 14 . . .

A4 09 . . 12

A5 . . . 13

 

12 REPLIES 12
Reeza
Super User

You're almost there, this line needs to be modified to only include records from one data set. Modify the IF statement and you should get what you need.

 

 

If (X=1 or y=1);

@CathyVI wrote:

 I will like to merge a two data set. Data19 contain everyone i wanted. Data18 contains some people in data19 and some who are not in data19. I want to merge the data so I can get all variables in data19 plus only the people in data18 who are not in data19.

For example:

data a19;
input bene_id$ A B C D;
datalines;
A1 . . 09 10
A2 . . . 12
A3 14 . . .
;run;

data a18;
input bene_id$ A B C D;
datalines;
A1 . . 09 .
A4 09 . . 12
A2 . 5 . .
A5 . . . 13
;run;

proc sort data=a19; by bene_id;run;
proc sort data=a18; by bene_id;run;

 

Data dummy;
Merge a19 (IN = X) a18 (IN=Y);
by bene_id;
If (X=1 or y=1);
run;

When I used the merge code above observation A1 did not have D=10, and A2 did not have B=12. What I want is to capture the entire a19 + a18 (with only observations not in a19).

 

This is what I want:

bene_id$ A B C D

A1 . . 09 10
A2 . . . 12
A3 14 . . .

A4 09 . . 12

A5 . . . 13

 


EDIT: If you only want ABCD from one data set, drop them from the other data set before joining using a data set option DROP.

 

 

CathyVI
Pyrite | Level 9

@Reeza  I have the  if (x=1 or y=1); in the code i presented initially but I am not getting my expected results.

Reeza
Super User

I didn't show you a solution, I told you which line is problematic and needs fixing. 

 

IF X=1 or Y=1 includes data from both data sets. You don't want both so you should remove one of those terms from the condition and add the DROP dataset option to one data set to not bring in the variables from that data set. 

 

 

CathyVI
Pyrite | Level 9

@Reeza  Could you show me  a code as an example. I still did not understand. I want all the data from a19 but only those that are not in a19 from a18 should only join to a19, to make a new dataset. 

Do you mean: 

Data dummy;

drop a18;
Merge a19(in=x) a18 (IN=Y);
by bene_id;
If (x=1 or y=1);
run;

 

Reeza
Super User
Data dummy;
Merge a19(in=x) a18 (IN=Y drop = a b c d);
by bene_id;
If (x=1);
run;

 

CathyVI
Pyrite | Level 9

@Reeza  Thank you but this is not providing my expected output.

This is what I want:

bene_id$  A  B  C  D

A1 . . 09 10
A2 . . . 12
A3 14 . . .

A4 09 . . 12

A5 . . . 13

Kurt_Bremser
Super User

Then the MERGE should look like this:

data want;
merge
  a19 (in=x)
  a18 (
    in=y
    rename=(a=_a b=_b c=_c d=_d)
  )
;
by bene_id;
if not x
then do;
  a = _a;
  b = _b;
  c = _c;
  d = _d;
end;
drop _:;
run;

Tested, creates your expected result.

Reeza
Super User

My mistake in understanding your requirements but the principle is still the same. Use two steps makes it a bit easier to understand though.

 

Data p18_not19;
Merge a19 (IN = X) a18 (IN=Y);
by bene_id;
If Y and not X;
run;

data want;
set a19 p18_not19;
run;

@CathyVI wrote:

@Reeza  Thank you but this is not providing my expected output.

This is what I want:

bene_id$  A  B  C  D

A1 . . 09 10
A2 . . . 12
A3 14 . . .

A4 09 . . 12

A5 . . . 13



 

Tom
Super User Tom
Super User

A normal MERGE will create the output you want.

data want;
  merge a18 a19;
  by bene_id;
run;

But since you are not really merging in any new variables that don't already exist then perhaps you just want to interleave the values and keep only one observation per BENE_ID.

data want;
  set a18 a19;
  by bene_id;
  if last.bene_id;
run;

Full example:

data a19;
  input bene_id $ A B C D;
datalines;
A1 . . 09 10
A2 . . . 12
A3 14 . . .
;

data a18;
  input bene_id $ A B C D;
datalines;
A1 . . 09 .
A2 . 5 . .
A4 09 . . 12
A5 . . . 13
;

data expect;
  input bene_id $ A B C D;
datalines;
A1 . . 09 10
A2 . . . 12
A3 14 . . .
A4 09 . . 12
A5 . . . 13
;

data want;
  set a18 a19;
  by bene_id;
  if last.bene_id;
run;

proc compare data=want compare=expect;
  id bene_id;
run;

The other possibility is you want to treat one of the dataset as transactions to be applied to the other.  But then that will treat the missing values differently.

data want2 ;
  update a18 a19;
  by bene_id;
run;

proc compare data=want2 compare=expect;
  id bene_id;
run;
The COMPARE Procedure                                                                                                               
Comparison of WORK.WANT2 with WORK.EXPECT                                                                                           
(Method=EXACT)                                                                                                                      
                                                                                                                                    
Value Comparison Results for Variables                                                                                              
                                                                                                                                    
__________________________________________________________                                                                          
           ||       Base    Compare                                                                                                 
 bene_id   ||          B          B      Diff.     % Diff                                                                           
 ________  ||  _________  _________  _________  _________                                                                           
           ||                                                                                                                       
 A2        ||     5.0000          .          .          .                                                                           
__________________________________________________________                                                                          

 

CathyVI
Pyrite | Level 9

@Tom  Thank you, your code works. So if I now want to add the third dataset a17:

data a17;
input bene_id$ A B C D;
datalines;
A1 . . 09 .
A2 . 5 . .
A8 09 . . 12
A9 . . . 13
;run;

I DID

data total;
set want a17;
by bene_id;
if last.bene_id;
run;

How will l do it. I tried creating a new data set with data WANT and a17 using the code you provided and follow the same logic but D10 was missing. That means SAS was reading the last ID for A1 from a17 and ignore WANT because of the first.bene_id. 

 

I want my expected results to be 

data expect;
input bene_id $ A B C D;
datalines;
A1 . . 09 10
A2 . . . 12
A3 14 . . .
A4 09 . . 12
A5 . . . 13

A8 09 . . 12
A9 . . . 13
; run;

 

 

Tom
Super User Tom
Super User

If you MERGE two (or more) datasets which have common variables the values of those common variables will be the last values loaded.

So if you want the values from WANT to "win" over the values from A17 then list WANT second in the MERGE statement.

data new;
  merge a17 want;
  by bene_id;
run;

Or you want values from A19 to override values from A18 which override values from A17 then use:

data want;
  merge a17 a18 a19;
  by bene_id;
run;

 

Reeza
Super User

Why not append all the data sets and then keep only the record from the latest data set?

 

data combined;
set a19 a18 a17 indsname= _source;
source = _source;
run;

proc sort data=combined;
by bene_id descending source;
run;

proc sort data=combined out=want uodupkey;
by bene_id;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1111 views
  • 2 likes
  • 4 in conversation