Hi all,
I have two databases - one of which is has 25 individual member observations with 4 variables (vars1-vars4). Each observation is identified by a unique ID number (id).
Obs | id | var1 | var2 | var3 | var4 |
1 | 860880007 | 2 | . | . | . |
2 | 860880013 | . | 1 | 1 | 2 |
3 | 860880015 | . | 1 | 1 | 2 |
4 | 860880023 | 2 | . | . | . |
5 | 861560006 | . | 1 | 1 | 1 |
6 | 861560009 | . | 1 | 1 | 2 |
7 | 861560012 | . | 1 | 1 | 2 |
8 | 861560013 | . | 1 | 1 | 2 |
9 | 861560017 | 1 | 1 | 1 | 2 |
10 | 861560030 | 1 | 1 | 1 | 2 |
11 | 861560031 | 2 | . | . | . |
12 | 861560040 | . | 1 | 1 | 2 |
13 | 861560041 | . | 1 | 1 | 2 |
14 | 861780002 | 2 | . | . | . |
15 | 861780012 | 2 | . | . | . |
16 | 861780013 | 2 | . | . | . |
17 | 861780024 | . | 1 | 1 | 2 |
18 | 861780027 | 2 | . | . | . |
19 | 861780036 | 1 | 1 | 1 | 2 |
20 | 861780050 | . | 1 | 1 | 2 |
21 | 861780053 | 1 | 1 | 1 | 2 |
22 | 862440006 | 1 | 2 | . | . |
23 | 862440007 | 1 | 1 | 1 | 2 |
24 | 862440016 | . | 1 | 1 | 2 |
25 | 862440025 | . | 1 | 1 | 2 |
I have a second database - a paired database (12 observations) - in which each observation represents a pair of members that may have a corresponding member data in the database above. Each member of the pair is identified by a unique ID (one called left_id, the other right_id):
Obs | left_id | right_id |
1 | 861560006 | 861560005 |
2 | 861560013 | 861560012 |
3 | 860900004 | 860900003 |
4 | 861560031 | 861560030 |
5 | 860880024 | 860880023 |
6 | 861560041 | 861560040 |
7 | 861780013 | 861780012 |
8 | 861780002 | 861780001 |
9 | 861780050 | 861780049 |
10 | 862440007 | 862440006 |
11 | 862440016 | 862440015 |
12 | 862988001 | 862988000 |
I would like merge the data from the individual member database (if present) to the paired database with the structure below. I would like each left/right variable list to be populated with corresponding data from the individual database (filled in the first two rows as an example). The output database should have 12 observations like the paired database. If data are not available for one or both members of the pair then the var should be missing (see observation 12 on the paired database which does not have a corresponding individual data for either member).
Obs | left_id | left_var1 | left_var2 | left_var3 | left_var4 | right_id | right_var1 | right_var2 | right_var3 | right_var4 |
1 | 861560006 | . | 1 | 1 | 1 | 861560005 | . | . | . | . |
2 | 861560013 | . | 1 | 1 | 2 | 861560012 | . | 1 | 1 | 2 |
3 | 860900004 | 860900003 | ||||||||
4 | 861560031 | 861560030 | ||||||||
5 | 860880024 | 860880023 | ||||||||
6 | 861560041 | 861560040 | ||||||||
7 | 861780013 | 861780012 | ||||||||
8 | 861780002 | 861780001 | ||||||||
9 | 861780050 | 861780049 | ||||||||
10 | 862440007 | 862440006 | ||||||||
11 | 862440016 | 862440015 | ||||||||
12 | 862988001 | 862988000 |
Any solution using data step and Proc SQL would be much appreciated.
Thanks community.
--Tim
Use a different type of join
select ......
from table2 match
left join table1 left
on left.id = match.left_id
left join table1 right
on right.id = match.right_id
;
SQL is easy.
create table want as
select left.id as left_id
, left.var1 as left_var1
....
from table1 left
, table1 right
, table2 match
where left.id = match.left_id
and right.id = match.right_id
;
Thanks Tom,
The SQL produced what I wanted except that the output table 'want' have only rows where the left and right id match. Is it possible to include in the output table all rows from table 2 even if there was no match id's?
Thanks again.
--Tim
Use a different type of join
select ......
from table2 match
left join table1 left
on left.id = match.left_id
left join table1 right
on right.id = match.right_id
;
Thanks.
This works well.
--Tim
Hi,
For a step by step approach using data steps and proc sort:
data bypair;
pair_id=_n_;
set have2;
run;
proc sort data=bypair(drop=right_id)
out =byleft;
by left_id;
run;
proc sort data=bypair(drop=left_id)
out =byright;
by right_id;
run;
data full_left(rename=(var1=left_var1 var2=left_var2 var3=left_var3 var4=left_var4));
merge have1 (rename=(id=left_id))
byleft(in=inl)
;
by left_id;
if inl;
run;
data full_right(rename=(var1=right_var1 var2=right_var2 var3=right_var3 var4=right_var4));
merge have1 (in=inh rename=(id=right_id))
byright(in=inr)
;
by right_id;
if inr;
run;
proc sort data=full_left;
by pair_id;
run;
proc sort data=full_right;
by pair_id;
run;
data want(drop=pair_id);
merge full_left
full_right;
by pair_id;
run;
Regards,
Amir.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.