Hi all,
I am very new to SAS, and I am trying to merge two datasets. The first has pairwise comparisons between the proportion shared between 2 IDs. The second has additional identifying information about each of these IDs. I want the merge the two datasets to produce an output similar to what is shown below, but I am unsure where to start. I have provided samples of the data tables below. I am working in SAS 9.4. I appreciate any help you can provide.
Input 1 | |||
ID1 | ID2 | Pair_Num | Proportion_Shared |
1 | 2 | 1 | 0.97 |
1 | 3 | 2 | 81 |
1 | 4 | 3 | 0.73 |
1 | 5 | 4 | 0.81 |
2 | 3 | 5 | 0.56 |
2 | 4 | 6 | 0.71 |
2 | 5 | 7 | 0.31 |
3 | 4 | 8 | 0.81 |
3 | 5 | 9 | 0.66 |
4 | 5 | 10 | 0.59 |
Input 2 | ||
ID | Color | Age |
1 | Blue | 15 |
2 | Green | 21 |
3 | Blue | 78 |
4 | Green | 23 |
5 | Blue | 56 |
Desired Output | |||||
ID1 | ID2 | Pair_Num | Proportion_Shared | Colors | Age_Difference |
1 | 2 | 1 | 0.97 | Blue, Green | 6 |
1 | 3 | 2 | 81 | Blue, Blue | 63 |
1 | 4 | 3 | 0.73 | Blue, Green | 8 |
1 | 5 | 4 | 0.81 | Blue, Blue | 41 |
2 | 3 | 5 | 0.56 | Green, Blue | 57 |
2 | 4 | 6 | 0.71 | Green, Green | 2 |
2 | 5 | 7 | 0.31 | Green, Blue | 35 |
3 | 4 | 8 | 0.81 | Blue, Green | 55 |
3 | 5 | 9 | 0.66 | Blue, Blue | 22 |
4 | 5 | 10 | 0.59 | Green, Blue | 33 |
Another way:
data want(keep= ID1 ID2 Pair_Num Proportion_Shared Colors Age_Difference);
set have1;
length Colors $ 50;
if _n_ = 1 then do;
if 0 then set have2;
declare hash h(dataset: 'have2');
h.defineKey('ID');
h.defineData('Color', 'Age');
h.defineDone();
call missing(Color, Age);
end;
rc = h.find(key: ID1);
Colors = Color;
_Age = Age;
rc = h.find(key: ID2);
Colors = catx(', ', Colors, Color);
Age_Difference = abs(_Age - Age);
run;
So you have a PAIRS dataset with these variables:
ID1 | ID2 | Pair_Num | Proportion_Shared |
And and IDS dataset with these variables:
ID | Color | Age |
And you want to make new dataset with these variables.
ID1 | ID2 | Pair_Num | Proportion_Shared | Colors | Age_Difference |
So a simple SQL join should do the job.
proc sql;
create table WANT as
select a.*
, catx(',',b.color,c.color) as Colors
, b.age - c.age as Age_Difference
from PAIRS A
left join IDS B on a.id1=b.id
left join IDS C on a.id2=c.id
;
quit;
Another way:
data want(keep= ID1 ID2 Pair_Num Proportion_Shared Colors Age_Difference);
set have1;
length Colors $ 50;
if _n_ = 1 then do;
if 0 then set have2;
declare hash h(dataset: 'have2');
h.defineKey('ID');
h.defineData('Color', 'Age');
h.defineDone();
call missing(Color, Age);
end;
rc = h.find(key: ID1);
Colors = Color;
_Age = Age;
rc = h.find(key: ID2);
Colors = catx(', ', Colors, Color);
Age_Difference = abs(_Age - Age);
run;
And this is how we would have done it before the availability of hash objects:
data input1;
infile datalines dlm="09"x;
input ID1 ID2 Pair_Num Proportion_Shared;
datalines;
1 2 1 0.97
1 3 2 81
1 4 3 0.73
1 5 4 0.81
2 3 5 0.56
2 4 6 0.71
2 5 7 0.31
3 4 8 0.81
3 5 9 0.66
4 5 10 0.59
;
data input2;
infile datalines dlm="09"x;
input ID Color $ Age;
datalines;
1 Blue 15
2 Green 21
3 Blue 78
4 Green 23
5 Blue 56
;
data cntlin;
fmtname = "colorfmt";
type = "N";
do until (done1);
set input2 end=done1;
start = ID;
label = color;
output;
end;
fmtname = "agefmt";
type = "I";
do until (done2);
set input2 end=done2;
start = ID;
label = left(put(age,best.));
output;
end;
keep fmtname type start label;
run;
proc format cntlin=cntlin;
run;
data want;
set input1;
colors = catx(",",put(id1,colorfmt.),put(id2,colorfmt.));
age_difference = abs(input(id1,agefmt.) - input(id2,agefmt.));
run;
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.