topic Re: Stacking datasets while retaining only shared IDs in SAS Programming

Stacking datasets while retaining only shared IDs

des123 — Mon, 23 Feb 2026 15:58:06 GMT

Hello,

My data roughly looks something like the below, where all variables are shared in each of two datasets, and there can be multiple rows for each given ID.

I'd like to make a dataset that stacks the two datasets together, but only retains IDs that appear in both Dataset1 and Dataset2. It's expected that there will be multiple rows for a single ID in both the source and final datasets, I just do not want to retain IDs that appeared in only one of Dataset1 or Dataset2.

Dataset1:

ID	Var1
1	0
2	0
2	1
3	1
3	0
4	1

Dataset2

ID	Var1
3	2
4	2
5	0
6	0
7	0

Desired end state:

ID	Var1
3	1
3	0
3	2
4	1
4	2

So, as you can see, my desired end state retains only IDs 3 and 4, since those are the shared IDs between the two source datasets. But it still is fundamentally just stacking: I still want separate rows for each individual observation (as in reality, there are many more variables and ID is just identifying unique individuals, not unique observations).

Re: Stacking datasets while retaining only shared IDs

PaigeMiller — Mon, 23 Feb 2026 16:25:12 GMT

One approach is shown below. The code is untested. If you want tested code, please provide data as working SAS data step code (examples and instructions).

/* UNTESTED CODE */

/* Combine all rows */
data intermediate;
    set dataset1(in=in1) dataset2(in=in2);
    by id;
    ds1=in1;
    ds2=in2;
run;

/* Determine for each ID if there is one or more rows from dataset1 and one or more rows from dataset 2 */
proc summary data=intermediate;
    class id;
    var ds1 ds2;
    output out=sums sum=;
run;

/* Remove unwanted rows */
data want;
    merge intermediate sums;
    by id;
    if ds1>0 and ds2>0;
run;

Re: Stacking datasets while retaining only shared IDs

Tom — Mon, 23 Feb 2026 20:20:48 GMT

If the data is sorted by ID then it is pretty simple.

First let's convert your listings into actual datasets so we have something to program with.

data one;
  input id var1 ;
cards;
1 0
2 0
2 1
3 1
3 0
4 1
; 

data two;
  input id var1 ;
cards;
3 2
4 2
5 0
6 0
7 0
; 

data expect;
  input id var1;
cards;
3 1
3 0
3 2
4 1
4 2
;

Now just use a couple of DO loops. The first to check whether both datasets are contributing. And the second to actually process the data and output the desired observations.

data want;
  do until(last.id);
     merge one(in=in1) two(in=in2);
     by id;
  end;
  do until (last.id);
     set one two;
     by id;
     if in1 and in2 then output;
  end;
run;

Results

Re: Stacking datasets while retaining only shared IDs

Ksharp — Tue, 24 Feb 2026 05:54:25 GMT

data datasets1;
input id var1;
cards;
1 0
2 0
2 1
3 1
3 0
4 1
;
data datasets2;
input id var1;
cards;
3 2
4 2
5 0
6 0
7 0
;


proc sql;
create table want as
select * from datasets1 where id in (select id from datasets2)
union all
select * from datasets2 where id in (select id from datasets1)
;
quit;

Re: Stacking datasets while retaining only shared IDs

des123 — Tue, 24 Feb 2026 16:21:20 GMT

Thanks so much! This method was the first one I tried from this thread, and it worked like a charm!