BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
brianathrift
Fluorite | Level 6

Hello,

 

I am hoping to get help with calculating annual changes in pedestrian fatality rates, across counties/states. Counties names are not unique to each state, as some states have the same county names. Therefore, I first sorted by year, state, and county.

 

I have a dataset with year 2001-2017. Annual Fatality rates are recorded, per county/state. I am hoping to build a model where the dependent variable is the annual changes in rates. 

 

To make things easier to understand, a simple example of my data looks like the following:

 

DATA Ped_fatal;
LENGTH State $50 County $50;
INPUT Year State$ County$ Population Ped_deaths Rates_millions ;
DATALINES;
2001 AL Augusta 285000000 5633 19.8 
2002 AL Balwin 287600000 5516 19.2 
2003 AL Coffee 290100000 5403 18.6 
2004 AL Clark 292800000 5402 18.4
2005 AL Dallas 292500000 5678 19.4
2006 AL Delkab 298400000 5567 18.7
2007 CA Los Angeles 301200000 5400 17.9 
2008 CA Riverside 304100000 5132 16.9
2009 CA San Diego 307006550 4737 15.4 
2002 AL Augusta 308745538 4925 16.0
2002 AL Balwin 311591917 5139 18.9 
2003 AL Balwin 313914040 5552 8.0 
2004 AL Coffee 316128839 5528 16.7
2002 AL Dallas 318857056 5639 17.7 
2004 AL Dallas 312418820 6323 20.2
;
RUN;

Counties/State repeat per year, and I want to measure the annual change in rates, by county/state.

 

My current code to try to calculate annual changes in fatality rates are the following:

 

 

Proc Sort Data=Ped_fatals;
By Year State County;
Run;

Data want;
Set Ped_fatals;
By Year State County;
if year=2001 then lyr1 = lag1(Rates_millions);
if year=2002 then lyr2 = lag2(Rates_millions);
if year=2003 then lyr3 = lag3(Rates_millions);
if year=2004 then lyr4 = lag4(Rates_millions);

if year=2001 then count=0;
else if year=2002 then count=1;
else if year=2003 then count=2;
else if year=2004 then count=3;
else if year=2005 then count=4;
if count=1 then growth1yr = (Rates_millions - lyr1)/lyr1;
if count=2 then growth2yr = (Rates_millions - lyr2)/lyr2;
if count=3 then growth3yr = (Rates_millions - lyr3)/lyr3;
if count=4 then growth4yr = (Rates_millions - lyr4)/lyr4;
run;



proc print data=want (obs=10);
var year state county Rates_millions lyr1 growth1yr growth2yr;
where year=2002;
run;

 

 

My current output is the following:

 

Year State County Rate_millions growth1yr growth2yr
2002ALABAMAAutauga1.5129..
2002ALABAMABaldwin41.6898..
2002ALABAMACherokee2.6072..
2002ALABAMAChilton7.3703..
2002ALABAMADallas0.9538..
2002ALABAMADekalb3.1252..
2002ALABAMAGreene0.2280..
2002ALABAMAHale3.7803..
2002ALABAMAJackson11.3559..
2002ALABAMAJefferson13.7646..

 

I would appreciate any help/suggestions!

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

It would be very helpful if you gave us an example of data that had multiple years of the same county, and maybe even multiple years of two counties, so we could see exactly how the output relates to the input; and also perhaps then we could come up with some code that might work or see why your code doesn't work. As it stands, your output cannot be related to the input, there are counties shown in the output that don't exist in the input.

 

I suspect, but can't be sure from your description, that the proper sorting of the variables should be

 

BY STATE COUNTY YEAR;

 

and not the way you did it.

 

Also, by putting the LAG() functions inside an IF statement, you are getting wrong results and you probably want to do something like this:

lyr1 = lag1(Rates_millions);
lyr2 = lag2(Rates_millions);
lyr3 = lag3(Rates_millions);
lyr4 = lag4(Rates_millions);

But exactly how to proceed isn't clear, as I said, I don't know how you get the outputs from the inputs.

 

--
Paige Miller

View solution in original post

4 REPLIES 4
PaigeMiller
Diamond | Level 26

It would be very helpful if you gave us an example of data that had multiple years of the same county, and maybe even multiple years of two counties, so we could see exactly how the output relates to the input; and also perhaps then we could come up with some code that might work or see why your code doesn't work. As it stands, your output cannot be related to the input, there are counties shown in the output that don't exist in the input.

 

I suspect, but can't be sure from your description, that the proper sorting of the variables should be

 

BY STATE COUNTY YEAR;

 

and not the way you did it.

 

Also, by putting the LAG() functions inside an IF statement, you are getting wrong results and you probably want to do something like this:

lyr1 = lag1(Rates_millions);
lyr2 = lag2(Rates_millions);
lyr3 = lag3(Rates_millions);
lyr4 = lag4(Rates_millions);

But exactly how to proceed isn't clear, as I said, I don't know how you get the outputs from the inputs.

 

--
Paige Miller
brianathrift
Fluorite | Level 6
DATA Ped_fatal;
INPUT State County Year Rates_millions ;
DATALINES;

ALABAMA Baldwin 2001 14.5
ALABAMA Bullock 2001 16
ALABAMA Calhoun 2001 22
ALABAMA Clarke 2001 23
ALABAMA Coffee 2001 27
ALABAMA Conecuh 2001 18
ALABAMA Crenshaw 2001 16
ALABAMA Dallas 2001 34
ALABAMA Dekalb 2001 33
ALABAMA Elmore 2001 25
ALABAMA Autauga 2002 98
ALABAMA Baldwin 2002 56
ALABAMA Cherokee 2002 34
ALABAMA Chilton 2002 22
ALABAMA Dallas 2002 21
ALABAMA Dekalb 2002 78
ALABAMA Greene 2002 76
ALABAMA Hale 2002 56
ALABAMA Jackson 2002 46
ALABAMA Jefferson 2002 49
ALABAMA Autauga 2003 34
ALABAMA Baldwin 2003 14
ALABAMA Barbour 2003 27
ALABAMA Blount 2003 28
ALABAMA Butler 2003 24
ALABAMA Calhoun 2003 67
ALABAMA Chambers 2003 53
ALABAMA Chilton 2003 86
ALABAMA Clarke 2003 44
ALABAMA Cleburne 2003 37
;
RUN;

PROC SORT DATA=Ped_fatal;
BY State County Year;
Run;

DATA want;
   SET final;
   BY state county year;
  lyr1 = lag1(Rates_millions);
  lyr2 = lag2(Rates_millions);
  lyr3 = lag3(Rates_millions);
  lyr4 = lag4(Rates_millions);

   if year=2001 then count=0;
   else if year=2002 then count=1;
   else if year=2003 then count=2;
   else if year=2004 then count=3;
   else if year=2005 then count=4;
   if count=1 then growth1yr = (Rates_millions - lyr1)/lyr1;
   if count=2 then growth2yr = (Rates_millions - lyr2)/lyr2;
   if count=3 then growth3yr = (Rates_millions - lyr3)/lyr3;
   if count=4 then growth4yr = (Rates_millions - lyr4)/lyr4;
run;

proc print data=want (obs=10);
var year state county Rates_millions lyr1 growth1yr growth2yr;
where year=2002;
run;



Thank you for your feedback! The following example provides different counties/states per year. The example shows all of Alabama, however my dataset has millions of points across the US, including all 50 states. 

 

Quick side question: can you explain why the sorting order needs to be state, county, and then year? How does this sorting order change the results?

 

My output for the following code is:

 

Obs YEAR STATE COUNTY Rate_Millions lyr1 growth1yr growth2yr
2002ALABAMAAutauga1.5129...
2002ALABAMABaldwin41.68984.45548.35707.
2002ALABAMACherokee2.6072...
2002ALABAMAChilton7.370311.2873-0.34703.
2002ALABAMADallas0.95381.7975-0.46939.
2002ALABAMADekalb3.12522.75680.13363.
2002ALABAMAGreene0.2280...
2002ALABAMAHale3.7803...
2002ALABAMAJackson11.355912.7575-0.10987.
2002ALABAMAJefferson13.764643.4405-0.68314.

 

This output seems correct! So it looks like that change in code did the trick. If you could explain why the sorting order matters, and how to know how to sort of merging and such, I would greatly appreciate it. Throughout my analysis I have been merging 10 different datasets, all aggregated by county/state and year. I have been sorting by: Year, State, County...and now I am wondering if I need to go back and change the sorting order. I believe my datasets have merged correctly, but I just want to be sure.

 

Best,

Briana Thrift

unison
Lapis Lazuli | Level 10

Maybe something like this would make things a little more easy if you need to change the years in the future?

 

DATA Ped_fatal;
	LENGTH State $50 County $50;
	INPUT Year State$ County$ Population Ped_deaths Rates_millions;
	DATALINES;
2001 AL Augusta 285000000 5633 19.8 
2003 AL Augusta 290100000 5403 18.6 
2004 AL Augusta 292800000 5402 18.4
2005 AL Augusta 292500000 5678 19.4
2006 AL Delkab 298400000 5567 18.7
2007 CA Los Angeles 301200000 5400 17.9 
2001 AL Dallas 318857056 5639 16.7 
2002 AL Dallas 318857056 5639 17.7 
2004 AL Dallas 312418820 6323 20.2
;
RUN;

Proc Sort Data=Ped_fatal;
	By State County year;
Run;

proc transpose data=ped_fatal prefix=rate out=trx(drop=_name_);
	by state county;
	id year;
	var rates_millions;
run;

proc sql noprint outobs=1;
	select name into :first_yr from dictionary.columns where libname='WORK' and 
		memname='TRX' and name like "rate%" order by name;
quit;

proc sql noprint;
	select name into :all_yrs separated by " " from dictionary.columns where 
		libname='WORK' and memname='TRX' and name like "rate%" order by name;
quit;

data trx;
	retain state county &all_yrs.;
	set trx;
run;

data want;
	set trx;
	array rate rate:;

	do over rate;

		if rate=. then
			rate=0;
	end;

	do over rate;

		if vname(rate) ne "&first_yr." then
			rate=rate/&first_yr.-1;
	end;
	&first_yr=0;
	format rate: PERCENT.2;
run;

if you have any questions, just let me know!

-unison

-unison
PaigeMiller
Diamond | Level 26
DATA want;
   SET ped_fatal;
   BY state county year;
  lyr1 = lag1(Rates_millions);
  lyr2 = lag2(Rates_millions);
  lyr3 = lag3(Rates_millions);
  lyr4 = lag4(Rates_millions);
   if first.county then count=1;
   else count+1;
   if count=2 then growthyr=(rates_millions-lyr1)/lyr1;
   if count=3 then growthyr2=(rates_millions-lyr2)/lyr2;
   if count=4 then growthyr3=(rates_millions-lyr3)/lyr3;
run;

That ought to work. 

 

The reason you want BY STATE COUNTY YEAR is so that the lag function obtains the previous rates_millions from that exact STATE COUNTY in the previous year. If you sort it some other way, you get the previous rates_millions from some other county.

 

May I make a suggestion? Please proofread and test your code before you post it here; do not post code with obvious syntax errors or other code problems that you can fix. The code you posted does not read the state or county at all, and it refers to a data set named FINAL which doesn't exist. Clean up as much as you can before asking us for help. Thanks!

--
Paige Miller

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1085 views
  • 1 like
  • 3 in conversation