BookmarkSubscribeRSS Feed
wernie
Quartz | Level 8

Hi all,

 

I know I did this before, but can't seem to get this to work right now. I have a dataset that looks something like this:

 

Date                   State          County          CensusTract              Value       StdErr

01Jan2011          01                1001            1001020200              50.40       4.23

01Jan2011          01                1001            1001020300              29.47       1.39

01Jan2011          01                1001            1001020400              68.51       2.77

01Jan2011          01                1003            1003010200              5.38         8.47

01Jan2011          01                1003            1003010300              18.78       6.24

 

So I have daily values for each census tract within each county within each state. What I want is something that summarizes the maximum value (as well as the mean and the median) for each county in a day. So I want to end up with something like this, which results in a dataset that just has one observation per county per day:

 

Date                   State          County      Max_DailyValue     Mean_DailyValue   Med_DailyValue

01Jan2011          01                1001            100.84                 80.11                        64.58

01Jan2011          01                1003              68.53                 47.22                        33.46

01Jan2011          01                1005              85.66                 55.81                        45.28

01Jan2011          01                1006              74.82                 53.27                        47.63

 

I also want to calculate the population-weighted average value for the county, but haven't figured that out yet.

 

I've been trying with variations on this code, but it's not quite right. Any advice is appreciated. Thanks!

 

proc sql;
         create table county_2011 as
         select year, date, statefips, countyfips, max(value) as max_value, median(value) as med_value, mean(value) as mean_value
         from tractfinal_2011
         order by date, countyfips;
quit;

 

 

 

7 REPLIES 7
Reeza
Super User
Add a GROUP BY or use PROC MEANs instead, PROC MEANS is closer to what you want, easier IMO.

proc means data=have stackods max mean median NWAY;
class date state county;
format date yymon5.;
var value;
ods output summary=want;
run;

proc print data=want;
run;
ballardw
Super User

GROUP BY instead of Order by. GROUP BY will use the group combinations for the summary statistics. Order only controls the output order and has nothing to do with calculations.

 

 

wernie
Quartz | Level 8

I did try that first and just tried it again, but for some reason, that still gives me more than one row per county (looks like it's still displaying by tract even though that variable isn't there anymore). It's not collapsing it down to one row per county per day. I also tried that using different orders of the variables in the group by statement

Reeza
Super User
If you include variables in the select statement that are not in the GROUP BY Statement or not aggregated that will happen. post your code.
wernie
Quartz | Level 8

Thanks! That's what I tried after @Reeza posted and didn't get to come back to say that worked. That did what I needed it to do.

 

Now I'm trying to find out how to calculate a population-weighted mean based on all census tracts within a given county for each day. I tried doing that in proc sql, but it made SAS quit (I'm dealing with ~26 million records). Not sure if I can incorporate that into my code posted here for the other purposes.

Reeza
Super User
PROC MEANS has a weight option, SQL would make you do the calculation manually. 26 million rows shouldn't be an issue unless you're using SAS UE.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1191 views
  • 4 likes
  • 4 in conversation