Howdy
I have a file of employment data corresponding to 9 counties that I wish to aggregate across counties by NAICS. Within the file are employment observations for many different NAICS for each county. I wish to aggregate the employment obs. over all the counties by NAICS. I can do this with proc means or proc summary.
There is a problem though.
Some of the counties have observations that are missing. If I use proc means to aggregate the data, it treats the missing values as zeros. I would like to output an aggregated dataset for only those NAICS that have employment values for all nine counties. If a single county has a missing value I do not want to aggregate the employment observations. My thought is to subset the original data so that it contains data on NAICS with observed values for all nine counties. I know how to drop observations if a observation is missing but I do not know how to drop an entire class (NAICS) if one or more county observations in that class are missing.
Is there some exceptional SAS-pert that can help me?
The easiest way I can think of is to use proc sql to either pre-process your file and add a field that is a count of the NAICS for the particular county and then use that field in a where statement as part of your proc summary, or simply do the entire analysis in sql using the count in a having statement.
If you need an example, post some code for at least one field for a couple of counties, one that has data for all nine and one that doesn't.
Hi art297
An example would help me out greatly. Here is a snipet of data:
Here I have one missine observation for employment in county 1 for naics 442. I am trying to generate a dataset that strips all observations for naics 442 from the original set. In the original data I have many naics with and without missing observations across the nine counties.
county | naics3 | Emp |
1 | 442 | |
1 | 447 | 43 |
41 | 447 | 34 |
67 | 442 | 12 |
67 | 447 | 80 |
69 | 442 | 17 |
69 | 447 | 208 |
73 | 442 | 290 |
73 | 447 | 531 |
85 | 442 | 80 |
85 | 447 | 122 |
97 | 442 | 109 |
97 | 447 | 294 |
125 | 442 | 25 |
125 | 447 | 144 |
141 | 442 | 76 |
141 | 447 | 407 |
Why don't you use proc means to first create a dataset with the number of records with non-missing values, # with missing values, and the total, i.e., output N, NMISS and SUM, then you can select only those with N=9 or NMISS=0. Something like:
proc summary data=a missing nway;
class naics;
var employ;
output out=summary(drop=_:)
n=n nmiss=nmiss sum=total
;
run;
You can then check if nmiss is 0 or not.
Dling has the top solution, on which it is hard to imporove, except perhaps applying the "check" as the table is written, like:
output out=summary(drop=_: where=( not miss ) )
Here is an example of a proc sql solution:
data have;
input county naics3 Emp;
cards;
1 442 .
1 447 43
41 447 34
67 442 12
67 447 80
69 442 17
69 447 208
73 442 290
73 447 531
85 442 80
85 447 122
97 442 109
97 447 294
125 442 25
125 447 144
141 442 76
141 447 407
;
proc sql noprint;
create table want as
select naics3,count(emp) as count,sum(emp) as total_emp
from have
group by naics3
having count(emp) eq 9
;
quit;
proc sql;
create table want as
select * from have
where naics3 not in (select distinct naics3 from have where emp is missing);
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.