Hi,
The data set I have is like the example below. As you can see, there can be one or more subsets (group=1) within each ID. For example, there are 3 groups for ID=A , and there is 1 group for ID=B. I would like to keep only the first group (OBS=1-4 for ID=A; OBS=4-6 for ID=B) for each ID, but I cannot figure out how. Also, if possible, I would like to assign subgroup ID within ID. For example, subgroup ID=1 for OBS=1-4 and subgroup ID=2 for OBS=8-11, etc., for ID=A. Please help!
OBS | ID | group |
1 | A | 1 |
2 | A | 1 |
3 | A | 1 |
4 | A | 1 |
5 | A | |
6 | A | |
7 | A | |
8 | A | 1 |
9 | A | 1 |
10 | A | 1 |
11 | A | 1 |
12 | A | |
13 | A | 1 |
14 | A | 1 |
15 | A | 1 |
16 | A | 1 |
17 | A | 1 |
1 | B | |
2 | B | |
3 | B | |
4 | B | 1 |
5 | B | 1 |
6 | B | 1 |
7 | B |
You say
I would like to keep only the first group (OBS=1-4 for ID=A; OBS=4-6 for ID=B) for each ID,
but then you also say
I would like to assign subgroup ID within ID. For example, subgroup ID=1 for OBS=1-4 and subgroup ID=2 for OBS=8-11, etc., for ID=A.
What's the point of the second objective if you only want to produce output under the first?
This program does the second. It deletes all instances where group^=1, and assigns ascending subgroup values for each consecutive group of 1's, within each id.
data have;
infile cards truncover;
input OBS ID $ group ;
cards;
1 A 1
2 A 1
3 A 1
4 A 1
5 A
6 A
7 A
8 A 1
9 A 1
10 A 1
11 A 1
12 A
13 A 1
14 A 1
15 A 1
16 A 1
17 A 1
1 B
2 B
3 B
4 B 1
5 B 1
6 B 1
7 B
;
data want;
set have;
by id group notsorted;
obs=_n_;
if first.id then subgroup=0;
if first.group=1 and group=1 then subgroup+1;
if group ^=1 then delete; /*outputs ALL subgroups.*/
*if group=1 and subgroup=1; /*outputs only subgroup 1*/
run;
If you want only the first subgroup for each ID, then decomment the last IF statement (and you could drop the next-to-last IF should you want).
data have;
infile cards truncover;
input OBS ID $ group;
cards;
1 A 1
2 A 1
3 A 1
4 A 1
5 A
6 A
7 A
8 A 1
9 A 1
10 A 1
11 A 1
12 A
13 A 1
14 A 1
15 A 1
16 A 1
17 A 1
1 B
2 B
3 B
4 B 1
5 B 1
6 B 1
7 B
;
data want;
set have;
by id;
retain flag;
if first.id then flag=0;
if group=1 and flag=0 then do;
subgroup+1;
flag=1;
end;
if flag=1 and missing(group) then flag=2;
if flag=1 then output;
drop flag;
run;
Hello, @brainupgraded , from now on please provide data as working SAS data step code, as shown above (more detailed instructions here). Do not provide data as screen captures or copy and paste from Excel or any other method.
You say
I would like to keep only the first group (OBS=1-4 for ID=A; OBS=4-6 for ID=B) for each ID,
but then you also say
I would like to assign subgroup ID within ID. For example, subgroup ID=1 for OBS=1-4 and subgroup ID=2 for OBS=8-11, etc., for ID=A.
What's the point of the second objective if you only want to produce output under the first?
This program does the second. It deletes all instances where group^=1, and assigns ascending subgroup values for each consecutive group of 1's, within each id.
data have;
infile cards truncover;
input OBS ID $ group ;
cards;
1 A 1
2 A 1
3 A 1
4 A 1
5 A
6 A
7 A
8 A 1
9 A 1
10 A 1
11 A 1
12 A
13 A 1
14 A 1
15 A 1
16 A 1
17 A 1
1 B
2 B
3 B
4 B 1
5 B 1
6 B 1
7 B
;
data want;
set have;
by id group notsorted;
obs=_n_;
if first.id then subgroup=0;
if first.group=1 and group=1 then subgroup+1;
if group ^=1 then delete; /*outputs ALL subgroups.*/
*if group=1 and subgroup=1; /*outputs only subgroup 1*/
run;
If you want only the first subgroup for each ID, then decomment the last IF statement (and you could drop the next-to-last IF should you want).
This works perfectly. Thank you so much.
data have;
infile cards truncover;
input OBS ID $ group;
cards;
1 A 1
2 A 1
3 A 1
4 A 1
5 A
6 A
7 A
8 A 1
9 A 1
10 A 1
11 A 1
12 A
13 A 1
14 A 1
15 A 1
16 A 1
17 A 1
1 B
2 B
3 B
4 B 1
5 B 1
6 B 1
7 B
;
data temp;
set have;
by id group notsorted;
n+first.group;
run;
proc sql;
create table want as
select *
from temp
where group is not missing
group by id
having n=min(n);
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.