Hi all,
I am very new to SAS, and I am trying to merge two datasets. The first has pairwise comparisons between the proportion shared between 2 IDs. The second has additional identifying information about each of these IDs. I want the merge the two datasets to produce an output similar to what is shown below, but I am unsure where to start. I have provided samples of the data tables below. I am working in SAS 9.4. I appreciate any help you can provide.
Input 1 | |||
ID1 | ID2 | Pair_Num | Proportion_Shared |
1 | 2 | 1 | 0.97 |
1 | 3 | 2 | 81 |
1 | 4 | 3 | 0.73 |
1 | 5 | 4 | 0.81 |
2 | 3 | 5 | 0.56 |
2 | 4 | 6 | 0.71 |
2 | 5 | 7 | 0.31 |
3 | 4 | 8 | 0.81 |
3 | 5 | 9 | 0.66 |
4 | 5 | 10 | 0.59 |
Input 2 | ||
ID | Color | Age |
1 | Blue | 15 |
2 | Green | 21 |
3 | Blue | 78 |
4 | Green | 23 |
5 | Blue | 56 |
Desired Output | |||||
ID1 | ID2 | Pair_Num | Proportion_Shared | Colors | Age_Difference |
1 | 2 | 1 | 0.97 | Blue, Green | 6 |
1 | 3 | 2 | 81 | Blue, Blue | 63 |
1 | 4 | 3 | 0.73 | Blue, Green | 8 |
1 | 5 | 4 | 0.81 | Blue, Blue | 41 |
2 | 3 | 5 | 0.56 | Green, Blue | 57 |
2 | 4 | 6 | 0.71 | Green, Green | 2 |
2 | 5 | 7 | 0.31 | Green, Blue | 35 |
3 | 4 | 8 | 0.81 | Blue, Green | 55 |
3 | 5 | 9 | 0.66 | Blue, Blue | 22 |
4 | 5 | 10 | 0.59 | Green, Blue | 33 |
Another way:
data want(keep= ID1 ID2 Pair_Num Proportion_Shared Colors Age_Difference);
set have1;
length Colors $ 50;
if _n_ = 1 then do;
if 0 then set have2;
declare hash h(dataset: 'have2');
h.defineKey('ID');
h.defineData('Color', 'Age');
h.defineDone();
call missing(Color, Age);
end;
rc = h.find(key: ID1);
Colors = Color;
_Age = Age;
rc = h.find(key: ID2);
Colors = catx(', ', Colors, Color);
Age_Difference = abs(_Age - Age);
run;
So you have a PAIRS dataset with these variables:
ID1 | ID2 | Pair_Num | Proportion_Shared |
And and IDS dataset with these variables:
ID | Color | Age |
And you want to make new dataset with these variables.
ID1 | ID2 | Pair_Num | Proportion_Shared | Colors | Age_Difference |
So a simple SQL join should do the job.
proc sql;
create table WANT as
select a.*
, catx(',',b.color,c.color) as Colors
, b.age - c.age as Age_Difference
from PAIRS A
left join IDS B on a.id1=b.id
left join IDS C on a.id2=c.id
;
quit;
Another way:
data want(keep= ID1 ID2 Pair_Num Proportion_Shared Colors Age_Difference);
set have1;
length Colors $ 50;
if _n_ = 1 then do;
if 0 then set have2;
declare hash h(dataset: 'have2');
h.defineKey('ID');
h.defineData('Color', 'Age');
h.defineDone();
call missing(Color, Age);
end;
rc = h.find(key: ID1);
Colors = Color;
_Age = Age;
rc = h.find(key: ID2);
Colors = catx(', ', Colors, Color);
Age_Difference = abs(_Age - Age);
run;
And this is how we would have done it before the availability of hash objects:
data input1;
infile datalines dlm="09"x;
input ID1 ID2 Pair_Num Proportion_Shared;
datalines;
1 2 1 0.97
1 3 2 81
1 4 3 0.73
1 5 4 0.81
2 3 5 0.56
2 4 6 0.71
2 5 7 0.31
3 4 8 0.81
3 5 9 0.66
4 5 10 0.59
;
data input2;
infile datalines dlm="09"x;
input ID Color $ Age;
datalines;
1 Blue 15
2 Green 21
3 Blue 78
4 Green 23
5 Blue 56
;
data cntlin;
fmtname = "colorfmt";
type = "N";
do until (done1);
set input2 end=done1;
start = ID;
label = color;
output;
end;
fmtname = "agefmt";
type = "I";
do until (done2);
set input2 end=done2;
start = ID;
label = left(put(age,best.));
output;
end;
keep fmtname type start label;
run;
proc format cntlin=cntlin;
run;
data want;
set input1;
colors = catx(",",put(id1,colorfmt.),put(id2,colorfmt.));
age_difference = abs(input(id1,agefmt.) - input(id2,agefmt.));
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.