Hi all,
I am very new to SAS, and I am trying to merge two datasets. The first has pairwise comparisons between the proportion shared between 2 IDs. The second has additional identifying information about each of these IDs. I want the merge the two datasets to produce an output similar to what is shown below, but I am unsure where to start. I have provided samples of the data tables below. I am working in SAS 9.4. I appreciate any help you can provide.
Input 1 | |||
ID1 | ID2 | Pair_Num | Proportion_Shared |
1 | 2 | 1 | 0.97 |
1 | 3 | 2 | 81 |
1 | 4 | 3 | 0.73 |
1 | 5 | 4 | 0.81 |
2 | 3 | 5 | 0.56 |
2 | 4 | 6 | 0.71 |
2 | 5 | 7 | 0.31 |
3 | 4 | 8 | 0.81 |
3 | 5 | 9 | 0.66 |
4 | 5 | 10 | 0.59 |
Input 2 | ||
ID | Color | Age |
1 | Blue | 15 |
2 | Green | 21 |
3 | Blue | 78 |
4 | Green | 23 |
5 | Blue | 56 |
Desired Output | |||||
ID1 | ID2 | Pair_Num | Proportion_Shared | Colors | Age_Difference |
1 | 2 | 1 | 0.97 | Blue, Green | 6 |
1 | 3 | 2 | 81 | Blue, Blue | 63 |
1 | 4 | 3 | 0.73 | Blue, Green | 8 |
1 | 5 | 4 | 0.81 | Blue, Blue | 41 |
2 | 3 | 5 | 0.56 | Green, Blue | 57 |
2 | 4 | 6 | 0.71 | Green, Green | 2 |
2 | 5 | 7 | 0.31 | Green, Blue | 35 |
3 | 4 | 8 | 0.81 | Blue, Green | 55 |
3 | 5 | 9 | 0.66 | Blue, Blue | 22 |
4 | 5 | 10 | 0.59 | Green, Blue | 33 |
Another way:
data want(keep= ID1 ID2 Pair_Num Proportion_Shared Colors Age_Difference);
set have1;
length Colors $ 50;
if _n_ = 1 then do;
if 0 then set have2;
declare hash h(dataset: 'have2');
h.defineKey('ID');
h.defineData('Color', 'Age');
h.defineDone();
call missing(Color, Age);
end;
rc = h.find(key: ID1);
Colors = Color;
_Age = Age;
rc = h.find(key: ID2);
Colors = catx(', ', Colors, Color);
Age_Difference = abs(_Age - Age);
run;
So you have a PAIRS dataset with these variables:
ID1 | ID2 | Pair_Num | Proportion_Shared |
And and IDS dataset with these variables:
ID | Color | Age |
And you want to make new dataset with these variables.
ID1 | ID2 | Pair_Num | Proportion_Shared | Colors | Age_Difference |
So a simple SQL join should do the job.
proc sql;
create table WANT as
select a.*
, catx(',',b.color,c.color) as Colors
, b.age - c.age as Age_Difference
from PAIRS A
left join IDS B on a.id1=b.id
left join IDS C on a.id2=c.id
;
quit;
Another way:
data want(keep= ID1 ID2 Pair_Num Proportion_Shared Colors Age_Difference);
set have1;
length Colors $ 50;
if _n_ = 1 then do;
if 0 then set have2;
declare hash h(dataset: 'have2');
h.defineKey('ID');
h.defineData('Color', 'Age');
h.defineDone();
call missing(Color, Age);
end;
rc = h.find(key: ID1);
Colors = Color;
_Age = Age;
rc = h.find(key: ID2);
Colors = catx(', ', Colors, Color);
Age_Difference = abs(_Age - Age);
run;
And this is how we would have done it before the availability of hash objects:
data input1;
infile datalines dlm="09"x;
input ID1 ID2 Pair_Num Proportion_Shared;
datalines;
1 2 1 0.97
1 3 2 81
1 4 3 0.73
1 5 4 0.81
2 3 5 0.56
2 4 6 0.71
2 5 7 0.31
3 4 8 0.81
3 5 9 0.66
4 5 10 0.59
;
data input2;
infile datalines dlm="09"x;
input ID Color $ Age;
datalines;
1 Blue 15
2 Green 21
3 Blue 78
4 Green 23
5 Blue 56
;
data cntlin;
fmtname = "colorfmt";
type = "N";
do until (done1);
set input2 end=done1;
start = ID;
label = color;
output;
end;
fmtname = "agefmt";
type = "I";
do until (done2);
set input2 end=done2;
start = ID;
label = left(put(age,best.));
output;
end;
keep fmtname type start label;
run;
proc format cntlin=cntlin;
run;
data want;
set input1;
colors = catx(",",put(id1,colorfmt.),put(id2,colorfmt.));
age_difference = abs(input(id1,agefmt.) - input(id2,agefmt.));
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.