Solved: How to select specific records in big data

hjjijkkl · Posted 02-19-2021 01:51 PM

I have a big data and I want to select specific records where the identified ID have a test result <=60 . I want to exclude if the subject "ID" has test result >60 during any time period from the analysis

ID	date	test
1	5/21/2019	43
1	8/15/2019	12
1	10/16/2020	20
2	8/4/2019	10
2	12/2/1019	70
3	5/21/2019	7
3	8/15/2019	60
4	1/6/1019	21
4	5/23/2019	33
4	8/5/2019	57

I want an output with ID that only have <=60

ID

1

3

PGStats · Posted 02-19-2021 02:29 PM

But if your data is already sorted by ID, a datastep should be faster:

data want;
do until(last.ID);
	set have; by ID;
	t = max(t, test);
	end;
if t <= 60;
keep ID;
run;

PG

View solution in original post

PGStats · Posted 02-19-2021 02:24 PM

The simplest way might be to use SQL:

proc sql;
create table want as
select ID
from have
group by ID
having max(test) <= 60;
quit;

PG

PGStats · Posted 02-19-2021 02:29 PM

But if your data is already sorted by ID, a datastep should be faster:

data want;
do until(last.ID);
	set have; by ID;
	t = max(t, test);
	end;
if t <= 60;
keep ID;
run;

PG

hjjijkkl · Posted 02-19-2021 06:15 PM

I have a big data and I want to select specific records where the identified ID have a test result <=60 . I want to exclude if the subject "ID" has test result >60 during any time period from the analysis

ID	date	test
1	5/21/2019	43
1	8/15/2019	12
1	10/16/2020	20
2	8/4/2019	10
2	12/2/1019	70
3	5/21/2019	7
3	8/15/2019	60
4	1/6/1019	21
4	5/23/2019	33
4	8/5/2019	57

I want an output with ID that only have <=60

ID	ID1
1	1
2
3	3
4

Ksharp · Posted 02-20-2021 06:55 AM

data have;
input ID
date : $10.
test ;
cards;
1
5/21/2019
43
1
8/15/2019
12
1
10/16/2020
20
2
8/4/2019
10
2
12/2/1019
70
3
5/21/2019
7
3
8/15/2019
60
4
1/6/1019
21
4
5/23/2019
33
4
8/5/2019
57
;

proc sql;
create table want as
select distinct id
from have
group by id
having sum(test>60)=0;
quit;

hjjijkkl · Posted 02-21-2021 05:11 PM

How do I still keep the ID and create a new ID variable for those who have test <=60?

As in the table below, want to keep the list of ID and have "ID1" as a new variable to represent the ones that have test <=60

I want an output to look like this

ID	ID1
1	1
2
3	3
4

Ksharp · Posted 02-22-2021 06:02 AM

data have;
input ID
date : $10.
test ;
cards;
1
5/21/2019
43
1
8/15/2019
12
1
10/16/2020
20
2
8/4/2019
10
2
12/2/1019
70
3
5/21/2019
7
3
8/15/2019
60
4
1/6/1019
21
4
5/23/2019
33
4
8/5/2019
57
;

proc sql;
create table want as
select a.id as id,b.id as id1 from
(select distinct id from have) as a natural left join
(select distinct id from have group by id having sum(test>60)=0) as b
;
quit;

How to select specific records in big data

Re: How to select specific records in big data

Re: How to select specific records in big data

Re: How to select specific records in big data

SLECTING SPECIFC OBS

Re: How to select specific records in big data

Re: How to select specific records in big data

Re: How to select specific records in big data

Catch up on SAS Innovate 2026