Hello,
My data roughly looks something like the below, where all variables are shared in each of two datasets, and there can be multiple rows for each given ID.
I'd like to make a dataset that stacks the two datasets together, but only retains IDs that appear in both Dataset1 and Dataset2. It's expected that there will be multiple rows for a single ID in both the source and final datasets, I just do not want to retain IDs that appeared in only one of Dataset1 or Dataset2.
Dataset1:
| ID | Var1 |
| 1 | 0 |
| 2 | 0 |
| 2 | 1 |
| 3 | 1 |
| 3 | 0 |
| 4 | 1 |
Dataset2
| ID | Var1 |
| 3 | 2 |
| 4 | 2 |
| 5 | 0 |
| 6 | 0 |
| 7 | 0 |
Desired end state:
| ID | Var1 |
| 3 | 1 |
| 3 | 0 |
| 3 | 2 |
| 4 | 1 |
| 4 | 2 |
So, as you can see, my desired end state retains only IDs 3 and 4, since those are the shared IDs between the two source datasets. But it still is fundamentally just stacking: I still want separate rows for each individual observation (as in reality, there are many more variables and ID is just identifying unique individuals, not unique observations).
One approach is shown below. The code is untested. If you want tested code, please provide data as working SAS data step code (examples and instructions).
/* UNTESTED CODE */
/* Combine all rows */
data intermediate;
set dataset1(in=in1) dataset2(in=in2);
by id;
ds1=in1;
ds2=in2;
run;
/* Determine for each ID if there is one or more rows from dataset1 and one or more rows from dataset 2 */
proc summary data=intermediate;
class id;
var ds1 ds2;
output out=sums sum=;
run;
/* Remove unwanted rows */
data want;
merge intermediate sums;
by id;
if ds1>0 and ds2>0;
run;
One approach is shown below. The code is untested. If you want tested code, please provide data as working SAS data step code (examples and instructions).
/* UNTESTED CODE */
/* Combine all rows */
data intermediate;
set dataset1(in=in1) dataset2(in=in2);
by id;
ds1=in1;
ds2=in2;
run;
/* Determine for each ID if there is one or more rows from dataset1 and one or more rows from dataset 2 */
proc summary data=intermediate;
class id;
var ds1 ds2;
output out=sums sum=;
run;
/* Remove unwanted rows */
data want;
merge intermediate sums;
by id;
if ds1>0 and ds2>0;
run;
If the data is sorted by ID then it is pretty simple.
First let's convert your listings into actual datasets so we have something to program with.
data one;
input id var1 ;
cards;
1 0
2 0
2 1
3 1
3 0
4 1
;
data two;
input id var1 ;
cards;
3 2
4 2
5 0
6 0
7 0
;
data expect;
input id var1;
cards;
3 1
3 0
3 2
4 1
4 2
;
Now just use a couple of DO loops. The first to check whether both datasets are contributing. And the second to actually process the data and output the desired observations.
data want;
do until(last.id);
merge one(in=in1) two(in=in2);
by id;
end;
do until (last.id);
set one two;
by id;
if in1 and in2 then output;
end;
run;
Results
data datasets1;
input id var1;
cards;
1 0
2 0
2 1
3 1
3 0
4 1
;
data datasets2;
input id var1;
cards;
3 2
4 2
5 0
6 0
7 0
;
proc sql;
create table want as
select * from datasets1 where id in (select id from datasets2)
union all
select * from datasets2 where id in (select id from datasets1)
;
quit;
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.