Contributor
Posts: 25

# Deleting rows that have less than ten observations by two vars, after collapsing across a third var

I have about 250k rows, each row represents a unique combination of agegroup (18 agegroups), sex, and a county + census tract variable, the count variable is how many people exist in that stratum. I want to delete county/census tracts where the total count is less than 10, by sex, by collapsing over the agegrp variable. Ie. I want to count the total people in that county/tract by sex, but preserve the agegroups in the dataset for future analysis.

Example of the data:

 Agegrp Sex Countytract Count 1 M A 1 2 M A 0 3 M A 2 4 M A 1 5 M A 0 6 M A 0 7 M A 0 8 M A 2 9 M A 1 10 M A 1
Super User
Posts: 6,642

## Re: Deleting rows that have less than ten observations by two vars, after collapsing across a third

You'll have to show what the result should be in this case.  It's too difficult to figure out what you mean when you want to collapse yet preserve at the same time.

Contributor
Posts: 25

## Re: Deleting rows that have less than ten observations by two vars, after collapsing across a third

I want to collapse just to count and delete, but then have the age groups preserved for the rows that are not deleted.

Another way to say it: I want to count everyone in the county / census tract by sex and then delete where there are <10, ignoring age, but preserve the age group variable for future analyses.

I was doing this:

proc summary data=CRC;
by sex county1 ctract;
var count;
output out=delete sum=;
run;

Which gives me the counts that I want, but does not keep the agegrp variable from the original dataset.

Super User
Posts: 23,342

Super User
Posts: 6,642

## Re: Deleting rows that have less than ten observations by two vars, after collapsing across a third

I'm still guessing, but see if this comes close:

data want;

totcount=0;

do until (last.sex);

set have;

by Countrytract Sex notsorted;

totcount + count;

end;

do until (last.sex);

set have;

by Countrytract Sex notsorted;

if totcount >= 10 then output;

end;

drop totcount;

run;

The top DO loop sums up COUNT for a set of SEX/COUNTYTRACT.  The bottom DO loop reads the same observations, and outputs them if TOTCOUNT is 10 or more.

Discussion stats
• 4 replies
• 158 views
• 1 like
• 3 in conversation