DATA Step, Macro, Functions and more

tricky subsetting problem

Reply
Contributor
Posts: 33

tricky subsetting problem

Howdy

I have a file of employment data corresponding to 9 counties that I wish to aggregate across counties by NAICS.  Within the file are employment observations for many different NAICS for each county.  I wish to aggregate the employment obs. over all the counties by NAICS.  I can do this with proc means or proc summary. 

There is a problem though. 

Some of the counties have observations that are missing.  If I use proc means to aggregate the data, it treats the missing values as zeros.  I would like to output an aggregated dataset for only those NAICS that have employment values for all nine counties.  If a single county has a missing value I do not want to aggregate the employment observations.  My thought is to subset the original data so that it contains data on NAICS with observed values for all nine counties.  I know how to drop observations if a observation is missing but I do not know how to drop an entire class (NAICS) if one or more county observations in that class are missing.

Is there some exceptional SAS-pert that can help me?

PROC Star
Posts: 7,468

tricky subsetting problem

The easiest way I can think of is to use proc sql to either pre-process your file and add a field that is a count of the NAICS for the particular county and then use that field in a where statement as part of your proc summary, or simply do the entire analysis in sql using the count in a having statement.

If you need an example, post some code for at least one field for a couple of counties, one that has data for all nine and one that doesn't.

Contributor
Posts: 33

tricky subsetting problem

Hi art297

An example would help me out greatly.  Here is a snipet of data:

Here I have one missine observation for employment in county 1 for naics 442.  I am trying to generate a dataset that strips all observations for naics 442 from the original set.  In the original data I have many naics with and without missing observations across the nine counties.

countynaics3Emp
1442
144743
4144734
6744212
6744780
6944217
69447208
73442290
73447531
8544280
85447122
97442109
97447294
12544225
125447144
14144276
141447407
Frequent Contributor
Posts: 104

Re: tricky subsetting problem

Why don't you use proc means to first create a dataset with the number of records with non-missing values, # with missing values, and the total, i.e., output N, NMISS and SUM, then you can select only those with N=9 or NMISS=0.  Something like:

proc summary data=a missing nway;
     class naics;
     var employ;

     output out=summary(drop=_Smiley Happy
          n=n    nmiss=nmiss    sum=total
     ;
run;

You can then check if nmiss is 0 or not.

Valued Guide
Posts: 2,177

tricky subsetting problem

Dling has the top solution, on which it is hard to imporove, except perhaps applying the "check" as the table is written, like:

output out=summary(drop=_: where=( not miss ) )

PROC Star
Posts: 7,468

tricky subsetting problem

Here is an example of a proc sql solution:

data have;

  input county naics3 Emp;

  cards;

1          442          .

1          447          43

41          447          34

67          442          12

67          447          80

69          442          17

69          447          208

73          442          290

73          447          531

85          442          80

85          447          122

97          442          109

97          447          294

125          442          25

125          447          144

141          442          76

141          447          407

;

proc sql noprint;

  create table want as

    select naics3,count(emp) as count,sum(emp) as total_emp

      from have

        group by naics3

          having count(emp) eq 9

  ;

quit;

Contributor
Posts: 70

tricky subsetting problem

proc sql;

  create table want as

  select * from have

  where naics3 not in (select distinct naics3 from have where emp is missing);

quit;

Ask a Question
Discussion stats
  • 6 replies
  • 190 views
  • 0 likes
  • 5 in conversation