Hi,
I have 8 observations per subject ID. All subjects were asked to complete 8 tasks (i.e., 8 observations) in the same session, i.e., each ID should have an identical session time. However, in my dataset, for ID #5, 2 observations had the same session time, and 6 observations had the same session time. Therefore, I would like to exclude ID#5 from this example dataset. I have multiple IDs in my extended dataset. I would appreciate if someone could provide an example code to remove IDs with different session times.
Example dataset:
ID | SessionTime | v1 | v2 | v3 |
1 | 18.2 | 2 | 3 | 6 |
1 | 18.2 | 3 | 7 | 8 |
1 | 18.2 | 4 | 6 | 12 |
1 | 18.2 | 6 | 8 | 10 |
1 | 18.2 | 8 | 9 | 22 |
1 | 18.2 | 9 | 10 | 25 |
1 | 18.2 | 10 | 12 | 30 |
1 | 16.7 | 12 | 17 | 4 |
2 | 16.7 | 3 | 7 | 6 |
2 | 16.7 | 4 | 11 | 10 |
2 | 16.7 | 5 | 10 | 8 |
2 | 16.7 | 7 | 12 | 20 |
2 | 16.7 | 9 | 13 | 23 |
2 | 16.7 | 10 | 14 | 28 |
2 | 16.7 | 11 | 16 | 2 |
2 | 16.7 | 13 | 21 | 4 |
3 | 18.2 | 4 | 11 | 8 |
3 | 18.2 | 5 | 15 | 6 |
3 | 18.2 | 6 | 14 | 18 |
3 | 18.2 | 8 | 16 | 21 |
3 | 18.2 | 10 | 17 | 26 |
3 | 18.2 | 11 | 18 | 0 |
3 | 18.2 | 12 | 20 | 2 |
3 | 18.2 | 14 | 25 | 6 |
4 | 19.3 | 5 | 15 | 4 |
4 | 19.3 | 6 | 19 | 16 |
4 | 19.3 | 7 | 18 | 19 |
4 | 19.3 | 9 | 20 | 24 |
4 | 19.3 | 11 | 21 | 7 |
4 | 19.3 | 12 | 22 | 0 |
4 | 19.3 | 13 | 24 | 4 |
4 | 19.3 | 15 | 29 | 2 |
5 | 20.3 | 6 | 19 | 14 |
5 | 20.3 | 7 | 23 | 17 |
5 | 28.4 | 8 | 22 | 22 |
5 | 28.4 | 10 | 24 | 5 |
5 | 28.4 | 12 | 25 | 5 |
5 | 28.4 | 13 | 26 | 2 |
5 | 28.4 | 14 | 28 | 0 |
5 | 28.4 | 16 | 33 | 12 |
Thank you
data Session;
input ID SessionTime v1 v2 v3 ;
cards;
1 18.2 2 3 6
1 18.2 3 7 8
1 18.2 4 6 12
1 18.2 6 8 10
1 18.2 8 9 22
1 18.2 9 10 25
1 18.2 10 12 30
1 16.7 12 17 4
2 16.7 3 7 6
2 16.7 4 11 10
2 16.7 5 10 8
2 16.7 7 12 20
2 16.7 9 13 23
2 16.7 10 14 28
2 16.7 11 16 2
2 16.7 13 21 4
3 18.2 4 11 8
3 18.2 5 15 6
3 18.2 6 14 18
3 18.2 8 16 21
3 18.2 10 17 26
3 18.2 11 18 0
3 18.2 12 20 2
3 18.2 14 25 6
4 19.3 5 15 4
4 19.3 6 19 16
4 19.3 7 18 19
4 19.3 9 20 24
4 19.3 11 21 7
4 19.3 12 22 0
4 19.3 13 24 4
4 19.3 15 29 2
5 20.3 6 19 14
5 20.3 7 23 17
5 28.4 8 22 22
5 28.4 10 24 5
5 28.4 12 25 5
5 28.4 13 26 2
5 28.4 14 28 0
5 28.4 16 33 12
;;; run;
proc sql ;
create table chk as
select id, sessiontime, count(*) as frq
from Session
group by id, sessiontime
having frq=8
order by id, sessiontime
; quit;
proc sql;
create table session_ok as
select s.*
from session s
where id in (select id from chk);
quit;
Also ID=1 have 18.2 and 16.7 different SessionTime.
proc summary data=have nway;
class id;
var sessiontime;
output out=_stats_ range=sessiontime_range;
run;
data want;
merge have _stats_;
by id;
if sessiontime_range=0;
drop _type_ _freq_;
run;
In this code, we determine the mathematical range of the sessiontime values for each ID, then keep only those IDs for which the sessiontime range is zero.
Hi @drchiragvyas and welcome to the SAS Support Communities!
So you want to select entire "BY groups" (here: observations with a common ID) based on a condition involving more than just the first observation per group. You can do this in two steps (first, determine the IDs to keep or delete and then merge them back with the original data; examples have already been provided by AndreaVianello and PaigeMiller). But there are also several techniques which combine the two parts of the task in one step:
PROC SQL (unlike the two DATA step techniques) tends to change the order of observations (within a group), unless the order is uniquely determined by an ORDER BY clause. In your example data, the combination if ID and v1 could serve as a sort key:
proc sql;
create table want as
select * from have
group by id
having count(distinct sessiontime)=1
order by id, v1;
quit;
The two DATA step techniques read each BY group twice. In the first pass, the first value of SessionTime is stored in a temporary variable (named _st in the examples below) and a numeric flag variable (_del) is set to a positive value if a different value of SessionTime is encountered in the same BY group. In the second pass, only those BY groups are written to the output dataset for which the _del flag has not been set.
Double DOW loop:
data want(drop=_:);
do until(last.id);
set have;
by id;
if first.id then _st=sessiontime;
else if sessiontime ne _st then _del=1;
end;
do until(last.id);
set have;
by id;
if not _del then output;
end;
run;
Self-interleaving:
data want(drop=_:);
set have have(in=a);
by id;
if first.id then do;
_st=sessiontime;
_del=0;
end;
else _del+(sessiontime ne _st);
if a and not _del;
retain _:;
run;
Note that the PROC SQL step would create a different result than the two DATA steps if there were missing values of SessionTime: This is because count(distinct sessiontime) counts only non-missing values, whereas the DATA steps handle missing values like any other values. So, an ID with only (identical) missing SessionTime values would be selected by the DATA steps, but not by the PROC SQL step. However, an ID with one or more missing SessionTime values together with a constant non-missing SessionTime value would be selected by the PROC SQL step, but not by the DATA steps. Of course, all three techniques could be adapted to any requirements regarding missing values.
data have;
infile cards truncover expandtabs;
input ID SessionTime v1 v2 v3;
cards;
1 18.2 2 3 6
1 18.2 3 7 8
1 18.2 4 6 12
1 18.2 6 8 10
1 18.2 8 9 22
1 18.2 9 10 25
1 18.2 10 12 30
2 16.7 12 17 4
2 16.7 3 7 6
2 16.7 4 11 10
2 16.7 5 10 8
2 16.7 7 12 20
2 16.7 9 13 23
2 16.7 10 14 28
2 16.7 11 16 2
2 16.7 13 21 4
3 18.2 4 11 8
3 18.2 5 15 6
3 18.2 6 14 18
3 18.2 8 16 21
3 18.2 10 17 26
3 18.2 11 18 0
3 18.2 12 20 2
3 18.2 14 25 6
4 19.3 5 15 4
4 19.3 6 19 16
4 19.3 7 18 19
4 19.3 9 20 24
4 19.3 11 21 7
4 19.3 12 22 0
4 19.3 13 24 4
4 19.3 15 29 2
5 20.3 6 19 14
5 20.3 7 23 17
5 28.4 8 22 22
5 28.4 10 24 5
5 28.4 12 25 5
5 28.4 13 26 2
5 28.4 14 28 0
5 28.4 16 33 12
;
proc sql;
create table want as
select *
from have
group by id
having range(SessionTime)=0;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.