SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Deleting rows that have less than ten observations by two vars, after collapsing across a third var

Reply
Occasional Contributor
Posts: 10

Deleting rows that have less than ten observations by two vars, after collapsing across a third var

I have about 250k rows, each row represents a unique combination of agegroup (18 agegroups), sex, and a county + census tract variable, the count variable is how many people exist in that stratum. I want to delete county/census tracts where the total count is less than 10, by sex, by collapsing over the agegrp variable. Ie. I want to count the total people in that county/tract by sex, but preserve the agegroups in the dataset for future analysis.

 

Example of the data:

AgegrpSexCountytractCount
1MA1
2MA0
3MA2
4MA1
5MA0
6MA0
7MA0
8MA2
9MA1
10MA1
Super User
Posts: 5,084

Re: Deleting rows that have less than ten observations by two vars, after collapsing across a third

You'll have to show what the result should be in this case.  It's too difficult to figure out what you mean when you want to collapse yet preserve at the same time.

Occasional Contributor
Posts: 10

Re: Deleting rows that have less than ten observations by two vars, after collapsing across a third

I want to collapse just to count and delete, but then have the age groups preserved for the rows that are not deleted. 

 

Another way to say it: I want to count everyone in the county / census tract by sex and then delete where there are <10, ignoring age, but preserve the age group variable for future analyses.

 

I was doing this:

 

proc summary data=CRC;
by sex county1 ctract;
var count;
output out=delete sum=;
run;

 

Which gives me the counts that I want, but does not keep the agegrp variable from the original dataset.

Super User
Posts: 17,840

Re: Deleting rows that have less than ten observations by two vars, after collapsing across a third

Please post sample output.

Super User
Posts: 5,084

Re: Deleting rows that have less than ten observations by two vars, after collapsing across a third

I'm still guessing, but see if this comes close:

 

data want;

totcount=0;

do until (last.sex);

   set have;

   by Countrytract Sex notsorted;

   totcount + count;

end;

do until (last.sex);

   set have;

   by Countrytract Sex notsorted;

   if totcount >= 10 then output;

end;

drop totcount;

run;

 

The top DO loop sums up COUNT for a set of SEX/COUNTYTRACT.  The bottom DO loop reads the same observations, and outputs them if TOTCOUNT is 10 or more.

Ask a Question
Discussion stats
  • 4 replies
  • 141 views
  • 1 like
  • 3 in conversation