*New dataset, limited to 2019 and "all age groups";
Data est.estimates19;
Set est.estimates
(drop=state county stname);
Where year=12 and AGEGRP=0;
Total=sum(tot_male, tot_female); *Total population;
white=sum(wa_male, wa_female); *White;
black=sum(ba_male, ba_female); *Black;
natam=sum(IA_MALE, IA_FEMALE); *Native American/Alaskan Native;
asian=sum(AA_MALE, AA_FEMALE); *Asian;
island=sum(NA_MALE, NA_FEMALE); *Native Hawaiian/Pacific Islander;
multi=sum(TOM_MALE, TOM_FEMALE); *Multi/other;
hispanic=sum(H_male, H_female); *Hispanic Ethnicity = 391,382;
Non_hispanic=sum(NH_MALE, NH_FEMALE); *Non-Hispanic Ethnicity = 6,437,792;
Run;
I'm getting population estimates for county from a US Census datasets; I've had to sum the data together (male+female) to get totals for race/ethnicity.
I need to group these individual summed variables to get something that looks like this (numbers just for example):
Ultimately, my goal is to be able to output this an Excel file and be able to filter by Category (race/ethnicity) or by CAT_detail as needed.
County | Category | CAT_Detail | Population |
Sunnyville | Race | White | 5,602,223 |
Sunnyville | Ethnicity | Non-Hispanic | 7,56,010 |
Sunnyville | Race | Black | 1,501,011 |
Sunnyville | Race | Native American/Alaskan Native | 21,011 |
Sunnyville | Ethnicty | Hispanic | 1,002,001 |
Proc sql;
create table want as
select a.country, a.category, a.cat_detail, sum (desired_avariable) as Poutaion_sum
from have as a
group by a.country, a.category, a.cat_detail;
quit;
It might help to actually go back to your "raw" data, i.e. before you added the "tot_male" "tot_female" and such variables.
Numeric codes can be turned in "groups" very easily and no additional variables would need to be added.
For example if your race code is such that 1 and 2 are White (possibly separate codes for male/female), 3 & 4 for Black and so a a format such as
Proc format; value racesex 1,2 = 'White' 3,4 = 'Black' 5,6 = 'American Indian' 7,8 = 'Asian/Pacific Islander' ; value hisp 1='Non-Hispanic' 2='Hispanic' ;
run;
Can create groups.
And then use a report procedure like Proc tabulate to get the counts.
Proc tabulate data=yourrawpopestimateset; class racevar ethvar; /*<= these would be the names of a race variable coded as above for race and ethnicity*/ class county; /* note that a format could be turning raw codes of county to legible text as well*/ format racevar race. ethvar hisp; var popest; /* a single variable that has the population estimate instead of all those extraneous variables*/ table county *(all='Total Population' racevar='Race' ethvar='Ethnicity') popest='Population'*sum=' '*f=comma12. ; run;
The same Where statement could be used to filter data, maybe, depending on what else has been done to the raw data.
Similar formats can be made to use singe age values to create groups such as 5-year or 10-year groups when using agegrp other than 0.
@SAS93 wrote:
The variables like tot_male and ba_female are numeric variables. They provide the count of the population in a given county.
Doesn't invalidate what I posted.
I suspect you are using one of the US Census single-age by race and county data sets. The code I posted is intended to work pretty much with an "as read" version of the data where no one has mucked about creating superfluous race male/female variables.
If that is the file you are using I have been using them since 2007. And create race/ age/ ethicity /county (and multi county regions of my state) just by using formats applied to the variables. (And always thought including summaries like the agegroup=0 to be poor design)
The Proc tabulate keyword ALL would sum all of the race values to get that overall.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.