Think of two arbitrary panel data sets as follows.
Panel A Panel B
i j t x i j t y
1 10 1 0.54 1 10 1 0.32
1 10 2 0.55 1 10 2 0.50
1 10 3 0.90 1 10 3 0.71
2 20 1 0.03 2 . 1 0.22
2 20 2 0.52 2 . 2 0.69
2 20 3 0.22 2 . 3 0.81
3 30 1 0.18 3 30 1 0.83
3 30 2 0.14 3 30 2 0.81
3 30 3 0.46 3 30 3 0.51
. 40 1 0.21 4 40 1 0.14
. 40 2 0.19 4 40 2 0.42
. 40 3 0.74 4 40 3 0.95
5 50 1 0.10 5 50 1 0.38
5 50 2 0.14 5 50 2 0.45
5 50 3 0.59 5 50 3 0.39
As shown, two sets have three identifiers each—i and j for individuals, and t for time periods. Though both i and j capture the individuals, some are randomly missing. Fortunately, the second individual could be identified with i=2, and the fourth one could be with j=40, but what is the best way in SAS? For example, (1) DATA MERGE BY i will miss the fourth individual, (2) DATA MERGE BY j will miss the second one, and (3) DATA MERGE BY i j will miss both. As the real data sets are unbalanced and have more intermingled index structures, I cannot simply merge just with MERGE (without BY). Thanks for any help in advance.
I would recommend splitting your data first:
data panel_a_both
panel_a_missing_i (drop=i)
panel_a_missing_j (drop=j);
set panel_a;
if i=. then output panel_a_missing_i;
else if j=. then output panel_a_missing_j;
else output panel_a_both;
run;
If you do this for both PANEL_A and PANEL_B, then you can merge as needed. For example:
You will need a few more merges to come up with all the necessary matches.
As long as one missing piece is available in the other data set, you should be able to retrieve it in this way. There is a danger, however, that the matches will be wrong. For example, is it possible that J=20 occurs not only for I=2, but also for some other value of I? How do you know what the right match is in that case?
From your sample data, it looks like you can do it with a set instead of a merge.
data Panel_AB;
set PanelA;
set PanelB;
run;
It seems that i and j are synonyms.
Try next code:
data temp;
merge A (where=(i ^= .))
B;
by i t;
run;
data want;
merge b (where =( j ^= .))
temp;
by j t;
run;
I would recommend splitting your data first:
data panel_a_both
panel_a_missing_i (drop=i)
panel_a_missing_j (drop=j);
set panel_a;
if i=. then output panel_a_missing_i;
else if j=. then output panel_a_missing_j;
else output panel_a_both;
run;
If you do this for both PANEL_A and PANEL_B, then you can merge as needed. For example:
You will need a few more merges to come up with all the necessary matches.
As long as one missing piece is available in the other data set, you should be able to retrieve it in this way. There is a danger, however, that the matches will be wrong. For example, is it possible that J=20 occurs not only for I=2, but also for some other value of I? How do you know what the right match is in that case?
You could do this:
data pa;
input i j t x;
datalines;
1 10 1 0.54
1 10 2 0.55
1 10 3 0.90
2 20 1 0.03
2 20 2 0.52
2 20 3 0.22
3 30 1 0.18
3 30 2 0.14
3 30 3 0.46
. 40 1 0.21
. 40 2 0.19
. 40 3 0.74
5 50 1 0.10
5 50 2 0.14
5 50 3 0.59
;
data pb;
input i j t y;
datalines;
1 10 1 0.32
1 10 2 0.50
1 10 3 0.71
2 . 1 0.22
2 . 2 0.69
2 . 3 0.81
3 30 1 0.83
3 30 2 0.81
3 30 3 0.51
4 40 1 0.14
4 40 2 0.42
4 40 3 0.95
5 50 1 0.38
5 50 2 0.45
5 50 3 0.39
;
proc sql;
select *
from
(select pa.*, y
from pa inner join
pb on pa.i=pb.i and pa.j=pb.j and pa.t=pb.t)
union all corr
(select pa.i, pa.t, coalesce(pa.j, pb.j) as j, x, y
from pa inner join
pb on pa.i=pb.i and pa.t=pb.t
where cmiss(pa.j, pb.j) = 1)
union all corr
(select coalesce(pa.i, pb.i) as i, pa.j, pa.t, x, y
from pa inner join
pb on pa.j=pb.j and pa.t=pb.t
where cmiss(pa.i, pb.i) = 1)
order by i, j, t;
quit;
this solution assumes that at least one of the two datasets contains both i and j, as provided in your test data.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.