BookmarkSubscribeRSS Feed
geneshackman
Pyrite | Level 9

Hi all. I hope folks are healthy!

Should be a simple question. How to join two data sets that have mostly the same, but a few different, IDs?

data set 1

ID var1  var2

A   1      2

B   2     1

C  4     2

 

data set 2

ID var1  var2

A   1      2

C  4     2

D   2     1

 

Or create 3 data sets, one with common IDs, one with IDs only in data set 1, and one with IDs only in data set 2. I want to identify the IDs that are only in one data set.

 

Thanks

Stay healthy, please.

 

7 REPLIES 7
PaigeMiller
Diamond | Level 26

@geneshackman wrote:

Hi all. I hope folks are healthy!

Should be a simple question. How to join two data sets that have mostly the same, but a few different, IDs?

data set 1

ID var1  var2

A   1      2

B   2     1

C  4     2

 

data set 2

ID var1  var2

A   1      2

C  4     2

D   2     1

 


What is the desired output? All records from both data sets including mismatches, or only those records in both, or something else? Do you want the values of var1 and var2 in one data set to overwrite the values on the matching record in the other data set?

--
Paige Miller
geneshackman
Pyrite | Level 9

Good question, thanks for asking. I would like the output to be something like this

 

ID data1var1  data1var2   data2var1 data2var2

A      1                   2              1               2

B      2                   1

C      4                   2              4               2

D                                           2               1

 

I'm not sure how the above will show up, so let me see if this makes sense. I guess the columns would be ID, then the variables in data set 1, then the variables in data set 2. I could rename the variables in data set 1 or 2 or both to have "dataset1" at the beginning of the var name, like dataset1var1, or shorter, d1var1, d1var2, d2var1, d2var2.

 

Make sense?

 

The output would have all records from both data sets, including IDs are only in one of them. 

 

PaigeMiller
Diamond | Level 26
/* UNTESTED CODE */
data want;
    merge dataset1(rename=(var1=ds1_var1 var2=ds1_var2))
        dataset2(rename(var1=ds2_var1 var2=ds2_var2));
    by id;
run;

This assumes both datasets are sorted by the variable named ID.

--
Paige Miller
geneshackman
Pyrite | Level 9
Thanks, I'll try that.
sbxkoenk
SAS Super FREQ

Hello,

 

The below program may be beneficial to you as well :

data set_1;
input ID $ var1 var2;
cards;
A   1    2
B   2    1
C   4    2
;
run;

data set_2;
input ID $ var1 var2;
cards;
A   1    2
C   4    2
D   2    1
;
run;

proc sort data=set_1 out=s_set_1;
 by ID;
run;
proc sort data=set_2 out=s_set_2;
 by ID;
run;

data want1 want2 want3;
 merge s_set_1(in=int1) s_set_2(in=int2);
 by ID;
 if     int1 and     int2 then output want1;
 if     int1 and not int2 then output want2;
 if not int1 and     int2 then output want3;
run;
/* end of program */

Koen

geneshackman
Pyrite | Level 9

Thanks, this looks great too.

Ksharp
Super User
data set_1;
input ID $ var1 var2;
cards;
A   1    2
B   2    1
C   4    2
;
run;

data set_2;
input ID $ var1 var2;
cards;
A   1    2
C   4    2
D   2    1
;
run;

proc sql;
create table commonIDs as
select id from set_1
intersect 
select id from set_2;

create table  IDsonlyindataset1 as
select id from set_1
except 
select id from set_2;

create table  IDsonlyindataset2 as
select id from set_2
except 
select id from set_1;
quit;



hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2020 views
  • 4 likes
  • 4 in conversation