Hi all,
I know I did this before, but can't seem to get this to work right now. I have a dataset that looks something like this:
Date State County CensusTract Value StdErr
01Jan2011 01 1001 1001020200 50.40 4.23
01Jan2011 01 1001 1001020300 29.47 1.39
01Jan2011 01 1001 1001020400 68.51 2.77
01Jan2011 01 1003 1003010200 5.38 8.47
01Jan2011 01 1003 1003010300 18.78 6.24
So I have daily values for each census tract within each county within each state. What I want is something that summarizes the maximum value (as well as the mean and the median) for each county in a day. So I want to end up with something like this, which results in a dataset that just has one observation per county per day:
Date State County Max_DailyValue Mean_DailyValue Med_DailyValue
01Jan2011 01 1001 100.84 80.11 64.58
01Jan2011 01 1003 68.53 47.22 33.46
01Jan2011 01 1005 85.66 55.81 45.28
01Jan2011 01 1006 74.82 53.27 47.63
I also want to calculate the population-weighted average value for the county, but haven't figured that out yet.
I've been trying with variations on this code, but it's not quite right. Any advice is appreciated. Thanks!
proc sql;
create table county_2011 as
select year, date, statefips, countyfips, max(value) as max_value, median(value) as med_value, mean(value) as mean_value
from tractfinal_2011
order by date, countyfips;
quit;
GROUP BY instead of Order by. GROUP BY will use the group combinations for the summary statistics. Order only controls the output order and has nothing to do with calculations.
I did try that first and just tried it again, but for some reason, that still gives me more than one row per county (looks like it's still displaying by tract even though that variable isn't there anymore). It's not collapsing it down to one row per county per day. I also tried that using different orders of the variables in the group by statement
Replace the order by clause with
group by year, date, statefips, countyfips
Thanks! That's what I tried after @Reeza posted and didn't get to come back to say that worked. That did what I needed it to do.
Now I'm trying to find out how to calculate a population-weighted mean based on all census tracts within a given county for each day. I tried doing that in proc sql, but it made SAS quit (I'm dealing with ~26 million records). Not sure if I can incorporate that into my code posted here for the other purposes.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.