Quartz | Level 8

## Conditional edition

Hello dear friends;
I have a database of the form:

ID T AN OBS;
(ID = Individual ID, T = Team, AN = Year of Control and OBS = Observation);
I want to make an edition so as to keep only the teams with at least 3 individuals checked each year of control. Not necessarily the same individuals must be present every year.
NB * the number of individuals in my base is more than 800 miles,
The number of teams exceeds 3500
The number of years of checks is equal to 15.
I used "ODS" and the program runs well but I think there is another method less burdensome.
Thank you for helping me.

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Conditional edition

First, create a table for each team and year:

``````proc sql;
create table intermediate as
select t, an, count(distinct id) as individuals
from have
group by t, an
;
quit;``````

Next, extract only those teams that always exceed the count of 2:

``````proc sql;
create table want as
select distinct t
from intermediate
where t not in (select distinct t from intermediate where individuals < 3)
;
quit;``````

Given the size of your dataset, a data step method might be better:

``````proc sort
data=have (keep=id t an)
out=int1
nodupkey
;
by t an id;
run;

data int2 (keep=t);
set int1;
by t an;
if first.an then count = 0;
count + 1;
if last.an and count < 3 then output;
run;

data want;
merge
int1 (in=a)
int2 (in=b)
;
by t;
if not b;
run;

proc sort data=want nodupkey;
by t;
run;
``````
4 REPLIES 4
Diamond | Level 26

## Re: Conditional edition

Well, not sure how you used "ODS" to get a result - that is for creating output elements e.g PDF files?

Assuming your data is sorted (and you haven't provided any test data):

```data want;
set have;
retain cnt;
by id t an;
if first.t then cnt=1;
else cnt=cnt+1;
if cnt=3 then output;
run;```

What does this do.  Well on the first time year in id is encounterd the counter variable gets set to 1, then every row after that the counter is incremented by 1.  Only when counter is 3 will therecord output, so you will get one record per id/year when counter hits 3 - i.e. it has 3 records.

Super User

## Re: Conditional edition

First, create a table for each team and year:

``````proc sql;
create table intermediate as
select t, an, count(distinct id) as individuals
from have
group by t, an
;
quit;``````

Next, extract only those teams that always exceed the count of 2:

``````proc sql;
create table want as
select distinct t
from intermediate
where t not in (select distinct t from intermediate where individuals < 3)
;
quit;``````

Given the size of your dataset, a data step method might be better:

``````proc sort
data=have (keep=id t an)
out=int1
nodupkey
;
by t an id;
run;

data int2 (keep=t);
set int1;
by t an;
if first.an then count = 0;
count + 1;
if last.an and count < 3 then output;
run;

data want;
merge
int1 (in=a)
int2 (in=b)
;
by t;
if not b;
run;

proc sort data=want nodupkey;
by t;
run;
``````
Quartz | Level 8

## Re: Conditional edition

Yes, both proposals work well with some adaptations to my case. Thank you.

Quartz | Level 8

## Re: Conditional edition

Thanks for the solution.
The basic type I have is of the longitudinal type, so there are measures that repeat over time for the same ID during the same NA. If I understand correctly, with this code, we will have the Ts which also repeat as a function of time for the same ID and therefore it will return the number of T and not the number of IDs by T by AN (which I look for)

Discussion stats
• 4 replies
• 928 views
• 1 like
• 3 in conversation