How to remove specific ID based on a specific condition?

drchiragvyas · Posted 02-01-2023 07:24 PM

Hi,

I have 8 observations per subject ID. All subjects were asked to complete 8 tasks (i.e., 8 observations) in the same session, i.e., each ID should have an identical session time. However, in my dataset, for ID #5, 2 observations had the same session time, and 6 observations had the same session time. Therefore, I would like to exclude ID#5 from this example dataset. I have multiple IDs in my extended dataset. I would appreciate if someone could provide an example code to remove IDs with different session times.

Example dataset:

ID	SessionTime	v1	v2	v3
1	18.2	2	3	6
1	18.2	3	7	8
1	18.2	4	6	12
1	18.2	6	8	10
1	18.2	8	9	22
1	18.2	9	10	25
1	18.2	10	12	30
1	16.7	12	17	4
2	16.7	3	7	6
2	16.7	4	11	10
2	16.7	5	10	8
2	16.7	7	12	20
2	16.7	9	13	23
2	16.7	10	14	28
2	16.7	11	16	2
2	16.7	13	21	4
3	18.2	4	11	8
3	18.2	5	15	6
3	18.2	6	14	18
3	18.2	8	16	21
3	18.2	10	17	26
3	18.2	11	18	0
3	18.2	12	20	2
3	18.2	14	25	6
4	19.3	5	15	4
4	19.3	6	19	16
4	19.3	7	18	19
4	19.3	9	20	24
4	19.3	11	21	7
4	19.3	12	22	0
4	19.3	13	24	4
4	19.3	15	29	2
5	20.3	6	19	14
5	20.3	7	23	17
5	28.4	8	22	22
5	28.4	10	24	5
5	28.4	12	25	5
5	28.4	13	26	2
5	28.4	14	28	0
5	28.4	16	33	12

Thank you

AndreaVianello · Posted 02-02-2023 04:55 AM


data Session;
 input ID SessionTime  v1 v2 v3 ;
cards;
1 18.2 2 3 6
1 18.2 3 7 8
1 18.2 4 6 12
1 18.2 6 8 10
1 18.2 8 9 22
1 18.2 9 10 25
1 18.2 10 12 30
1 16.7 12 17 4
2 16.7 3 7 6
2 16.7 4 11 10
2 16.7 5 10 8
2 16.7 7 12 20
2 16.7 9 13 23
2 16.7 10 14 28
2 16.7 11 16 2
2 16.7 13 21 4
3 18.2 4 11 8
3 18.2 5 15 6
3 18.2 6 14 18
3 18.2 8 16 21
3 18.2 10 17 26
3 18.2 11 18 0
3 18.2 12 20 2
3 18.2 14 25 6
4 19.3 5 15 4
4 19.3 6 19 16
4 19.3 7 18 19
4 19.3 9 20 24
4 19.3 11 21 7
4 19.3 12 22 0
4 19.3 13 24 4
4 19.3 15 29 2
5 20.3 6 19 14
5 20.3 7 23 17
5 28.4 8 22 22
5 28.4 10 24 5
5 28.4 12 25 5
5 28.4 13 26 2
5 28.4 14 28 0
5 28.4 16 33 12
;;; run;


proc sql ;
 create table chk as 
 select id, sessiontime, count(*) as frq
 from Session
 group by id, sessiontime
  having frq=8
 order by id, sessiontime
; quit;

proc sql; 
 create table session_ok as 
  select s.* 
   from session s 
    where id in (select id from  chk);
quit;

Also ID=1 have 18.2 and 16.7 different SessionTime.

PaigeMiller · Posted 02-02-2023 06:12 AM

proc summary data=have nway;
    class id;
    var sessiontime;
    output out=_stats_ range=sessiontime_range;
run;
data want;
    merge have _stats_;
    by id;
    if sessiontime_range=0;
    drop _type_ _freq_;
run;

In this code, we determine the mathematical range of the sessiontime values for each ID, then keep only those IDs for which the sessiontime range is zero.

--
Paige Miller

FreelanceReinh · Posted 02-02-2023 06:47 AM

Hi @drchiragvyas and welcome to the SAS Support Communities!

So you want to select entire "BY groups" (here: observations with a common ID) based on a condition involving more than just the first observation per group. You can do this in two steps (first, determine the IDs to keep or delete and then merge them back with the original data; examples have already been provided by AndreaVianello and PaigeMiller). But there are also several techniques which combine the two parts of the task in one step:

PROC SQL with a GROUP BY and a HAVING clause (using automatic remerging)
double DOW loop (see, e.g., Practical Uses of the DOW Loop in Pharmaceutical Programming for background information and more examples)
self-interleaving (see, e.g., many posts by mkeintz)

PROC SQL (unlike the two DATA step techniques) tends to change the order of observations (within a group), unless the order is uniquely determined by an ORDER BY clause. In your example data, the combination if ID and v1 could serve as a sort key:

proc sql;
create table want as
select * from have
group by id
having count(distinct sessiontime)=1
order by id, v1;
quit;

The two DATA step techniques read each BY group twice. In the first pass, the first value of SessionTime is stored in a temporary variable (named _st in the examples below) and a numeric flag variable (_del) is set to a positive value if a different value of SessionTime is encountered in the same BY group. In the second pass, only those BY groups are written to the output dataset for which the _del flag has not been set.

Double DOW loop:

data want(drop=_:);
do until(last.id);
  set have;
  by id;
  if first.id then _st=sessiontime;
  else if sessiontime ne _st then _del=1;
end;
do until(last.id);
  set have;
  by id;
  if not _del then output;
end;
run;

Self-interleaving:

data want(drop=_:);
set have have(in=a);
by id;
if first.id then do;
  _st=sessiontime;
  _del=0;
end;
else _del+(sessiontime ne _st);
if a and not _del;
retain _:;
run;

Note that the PROC SQL step would create a different result than the two DATA steps if there were missing values of SessionTime: This is because count(distinct sessiontime) counts only non-missing values, whereas the DATA steps handle missing values like any other values. So, an ID with only (identical) missing SessionTime values would be selected by the DATA steps, but not by the PROC SQL step. However, an ID with one or more missing SessionTime values together with a constant non-missing SessionTime value would be selected by the PROC SQL step, but not by the DATA steps. Of course, all three techniques could be adapted to any requirements regarding missing values.

Ksharp · Posted 02-02-2023 07:19 AM

data have;
infile cards truncover expandtabs;
input ID SessionTime v1 v2 v3;
cards;
1 18.2 2 3 6
1 18.2 3 7 8
1 18.2 4 6 12
1 18.2 6 8 10
1 18.2 8 9 22
1 18.2 9 10 25
1 18.2 10 12 30
2 16.7 12 17 4
2 16.7 3 7 6
2 16.7 4 11 10
2 16.7 5 10 8
2 16.7 7 12 20
2 16.7 9 13 23
2 16.7 10 14 28
2 16.7 11 16 2
2 16.7 13 21 4
3 18.2 4 11 8
3 18.2 5 15 6
3 18.2 6 14 18
3 18.2 8 16 21
3 18.2 10 17 26
3 18.2 11 18 0
3 18.2 12 20 2
3 18.2 14 25 6
4 19.3 5 15 4
4 19.3 6 19 16
4 19.3 7 18 19
4 19.3 9 20 24
4 19.3 11 21 7
4 19.3 12 22 0
4 19.3 13 24 4
4 19.3 15 29 2
5 20.3 6 19 14
5 20.3 7 23 17
5 28.4 8 22 22
5 28.4 10 24 5
5 28.4 12 25 5
5 28.4 13 26 2
5 28.4 14 28 0
5 28.4 16 33 12
;

proc sql;
create table want as
select *
 from have
  group by id
   having range(SessionTime)=0;
quit;

How to remove specific ID based on a specific condition?

Re: How to remove specific ID based on a specific condition?

Re: How to remove specific ID based on a specific condition?

Re: How to remove specific ID based on a specific condition?

Re: How to remove specific ID based on a specific condition?

How to remove specific ID based on a specific condition?

Re: How to remove specific ID based on a specific condition?

Re: How to remove specific ID based on a specific condition?

Re: How to remove specific ID based on a specific condition?

Re: How to remove specific ID based on a specific condition?

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!