Hi all. This should be simple, but I can't quite get it to work, and what I found on web searching hasn't helped exactly, perhaps because i don't understand sql well enough. Here is part of the data. Each county has population by race/ethnicity. I want to get state population by race/ethnicity, summing across all counties.
Shouldn't this be fairly simple?
CountyCode | Race | Population | County |
36000 | State Total | 58344005 | New York State |
36001 | County Total | 915097 | Albany |
36001 | Hispanic or Latino | 58094 | Albany |
36001 | NH American Indian or Alaska Native | 1370 | Albany |
36001 | NH Asian and Pacific Islander | 63194 | Albany |
36001 | NH Black or African American | 116750 | Albany |
36001 | NH More than one race | 22848 | Albany |
36001 | NH White | 652841 | Albany |
36003 | County Total | 137788 | Allegany |
36003 | Hispanic or Latino | 2401 | Allegany |
36003 | NH American Indian or Alaska Native | 295 | Allegany |
36003 | NH Asian and Pacific Islander | 1739 | Allegany |
36003 | NH Black or African American | 1709 | Allegany |
36003 | NH More than one race | 1632 | Allegany |
36003 | NH White | 130012 | Allegany |
36005 | County Total | 4251645 | Bronx |
36005 | Hispanic or Latino | 2390496 | Bronx |
36005 | NH American Indian or Alaska Native | 11814 | Bronx |
36005 | NH Asian and Pacific Islander | 176974 | Bronx |
36005 | NH Black or African American | 1242803 | Bronx |
36005 | NH More than one race | 42322 | Bronx |
36005 | NH White | 387236 | Bronx |
Thanks
Gene
This is the reason:
select *
As you include all columns, SQL does an automatic remerge (see the NOTE in the log). To avoid this, you must only have results of summary functions and variables included in the GROUP BY in your SELECT.
select race, sum(population)
from tempa
group by race
Just to mention, i tried this so far
proc sql;
create table tempb as
select *, sum(population)
from tempa
group by race
order by race;
quit;
but it returns all the counties, by race, so it's returning the same data. What am i doing wrong?
This is the reason:
select *
As you include all columns, SQL does an automatic remerge (see the NOTE in the log). To avoid this, you must only have results of summary functions and variables included in the GROUP BY in your SELECT.
select race, sum(population)
from tempa
group by race
Thanks very much! That was it.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.