BookmarkSubscribeRSS Feed
jdub
Calcite | Level 5

Howdy

I have a file of employment data corresponding to 9 counties that I wish to aggregate across counties by NAICS.  Within the file are employment observations for many different NAICS for each county.  I wish to aggregate the employment obs. over all the counties by NAICS.  I can do this with proc means or proc summary. 

There is a problem though. 

Some of the counties have observations that are missing.  If I use proc means to aggregate the data, it treats the missing values as zeros.  I would like to output an aggregated dataset for only those NAICS that have employment values for all nine counties.  If a single county has a missing value I do not want to aggregate the employment observations.  My thought is to subset the original data so that it contains data on NAICS with observed values for all nine counties.  I know how to drop observations if a observation is missing but I do not know how to drop an entire class (NAICS) if one or more county observations in that class are missing.

Is there some exceptional SAS-pert that can help me?

6 REPLIES 6
art297
Opal | Level 21

The easiest way I can think of is to use proc sql to either pre-process your file and add a field that is a count of the NAICS for the particular county and then use that field in a where statement as part of your proc summary, or simply do the entire analysis in sql using the count in a having statement.

If you need an example, post some code for at least one field for a couple of counties, one that has data for all nine and one that doesn't.

jdub
Calcite | Level 5

Hi art297

An example would help me out greatly.  Here is a snipet of data:

Here I have one missine observation for employment in county 1 for naics 442.  I am trying to generate a dataset that strips all observations for naics 442 from the original set.  In the original data I have many naics with and without missing observations across the nine counties.

countynaics3Emp
1442
144743
4144734
6744212
6744780
6944217
69447208
73442290
73447531
8544280
85447122
97442109
97447294
12544225
125447144
14144276
141447407
DLing
Obsidian | Level 7

Why don't you use proc means to first create a dataset with the number of records with non-missing values, # with missing values, and the total, i.e., output N, NMISS and SUM, then you can select only those with N=9 or NMISS=0.  Something like:

proc summary data=a missing nway;
     class naics;
     var employ;

     output out=summary(drop=_:)
          n=n    nmiss=nmiss    sum=total
     ;
run;

You can then check if nmiss is 0 or not.

Peter_C
Rhodochrosite | Level 12

Dling has the top solution, on which it is hard to imporove, except perhaps applying the "check" as the table is written, like:

output out=summary(drop=_: where=( not miss ) )

art297
Opal | Level 21

Here is an example of a proc sql solution:

data have;

  input county naics3 Emp;

  cards;

1          442          .

1          447          43

41          447          34

67          442          12

67          447          80

69          442          17

69          447          208

73          442          290

73          447          531

85          442          80

85          447          122

97          442          109

97          447          294

125          442          25

125          447          144

141          442          76

141          447          407

;

proc sql noprint;

  create table want as

    select naics3,count(emp) as count,sum(emp) as total_emp

      from have

        group by naics3

          having count(emp) eq 9

  ;

quit;

Fisher
Quartz | Level 8

proc sql;

  create table want as

  select * from have

  where naics3 not in (select distinct naics3 from have where emp is missing);

quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 924 views
  • 0 likes
  • 5 in conversation