BookmarkSubscribeRSS Feed
brianathrift
Fluorite | Level 6

Hi!

 

I am needing to calculate annual rate changes, over a span of 17 years (2001-2017), and across over 3000 counties in US states. As of now, my code calculates the change of rate, however continues to use the first year (2001) as the "original rate" to derive the change of rate from. I would like to change this so then the change of rate can recalculate based on the year prior. For instance, the change of rate from 2001-2002, then 2002-2003, then 2003-2004...and so on. As of now, my code outputs change of rates from 2001-2002, 2001-2003, 2001-2004...and so one. Generally, I would like to change the baseline rate to change per year, to the year prior. 

 

I am on SAS version 9.4

 

Here is an example of my data:

Data Ped_fatal;
Length State $54 County $54;
Input State$ County$ Year Ped_rates ;
Datalines;

ALABAMA	Autauga	2002	1.51
ALABAMA	Autauga	2003	.
ALABAMA	Autauga	2005	8.28
ALABAMA	Autauga	2007	.
ALABAMA	Autauga	2008	14.27
ALABAMA	Autauga	2009	54.05
ALABAMA	Autauga	2010	.
ALABAMA	Autauga	2011	.
ALABAMA	Autauga	2012	19.41
ALABAMA	Autauga	2013	6.65
ALABAMA	Autauga	2014	29.13
ALABAMA	Autauga	2015	1.84
ALABAMA	Autauga	2016	.
ALABAMA	Autauga	2017	107.93
ALABAMA	Baldwin	2001	4.46
ALABAMA	Baldwin	2002	41.69
ALABAMA	Baldwin	2003	22.15
ALABAMA	Baldwin	2004	102.13
ALABAMA	Baldwin	2005	0.33
ALABAMA	Baldwin	2006	0.16
ALABAMA	Baldwin	2007	9.59
ALABAMA	Baldwin	2008	16.75

;
Run;

 

Here is my current code:

data want;
   set Ped_fatal;
   by state county year;
  lyr1 = lag1(Ped_rate);
  lyr2 = lag2(Ped_rate);
  lyr3 = lag3(Ped_rate);
  lyr4 = lag4(Ped_rate);
  lyr5 = lag5(Ped_rate);
  lyr6 = lag6(Ped_rate);
  lyr7 = lag7(Ped_rate);
  lyr8 = lag8(Ped_rate);
  lyr9 = lag9(Ped_rate);
  lyr10 = lag10(Ped_rate);
  lyr11 = lag11(Ped_rate);
  lyr12 = lag12(Ped_rate);
  lyr13 = lag13(Ped_rate);
  lyr14 = lag14(Ped_rate);
  lyr15 = lag15(Ped_rate);
  lyr16 = lag16(Ped_rate);
	if first.county then count=1;
   	else count+1;

   if count ge 1 then growth1yr = (Ped_rate - lyr1)/lyr1;
   if count ge 2 then growth2yr = (Ped_rate - lyr2)/lyr2;
   if count ge 3 then growth3yr = (Ped_rate - lyr3)/lyr3;
   if count ge 4 then growth4yr = (Ped_rate - lyr4)/lyr4;
   if count ge 5 then growth5yr = (Ped_rate - lyr5)/lyr5;
   if count ge 6 then growth6yr = (Ped_rate - lyr6)/lyr6;
   if count ge 7 then growth7yr = (Ped_rate - lyr7)/lyr7;
   if count ge 8 then growth8yr = (Ped_rate - lyr8)/lyr8;
   if count ge 9 then growth9yr = (Ped_rate - lyr9)/lyr9;
   if count ge 10 then growth10yr = (Ped_rate - lyr10)/lyr10;
   if count ge 11 then growth11yr = (Ped_rate - lyr11)/lyr11;
   if count ge 12 then growth12yr = (Ped_rate - lyr12)/lyr12;
   if count ge 13 then growth13yr = (Ped_rate - lyr13)/lyr13;
   if count ge 14 then growth14yr = (Ped_rate - lyr14)/lyr14;
   if count ge 15 then growth15yr = (Ped_rate - lyr15)/lyr15;
   if count ge 16 then growth16yr = (Ped_rate - lyr16)/lyr16;
run;

proc print data=want;
var year state county Ped_rate lyr1 lyr2 lyr3 lyr4 growth1yr growth2yr growth3yr growth4yr;
run;

My current output from the above code is the following:

 

YearStateCountyPed_ratelyr1lyr2lyr3lyr4growthyr1growthyr2growthyr3growthyr4
2002ALABAMAAutauga1.51........
2003ALABAMAAutauga.1.51.......
2005ALABAMAAutauga8.28.1.51...4.47..
2007ALABAMAAutauga.8.28.1.51.....
2008ALABAMAAutauga14.27.8.28.1.51.0.72.8.43
2009ALABAMAAutauga54.0514.27.8.28.2.79.5.53.
2010ALABAMAAutauga.54.0514.27.8.28....
2011ALABAMAAutauga..54.0514.27.....
2012ALABAMAAutauga19.41..54.0514.27..-0.640.36
2013ALABAMAAutauga6.6519.41..54.05-0.66..-0.88
2014ALABAMAAutauga29.136.6519.41..3.380.5..
2015ALABAMAAutauga1.8429.136.6519.41.-0.94-0.72-0.91.
2016ALABAMAAutauga.1.8429.136.6519.41....
2017ALABAMAAutauga107.93.1.8429.136.65.57.82.715.24
2001ALABAMABaldwin4.46107.93.1.8429.13-0.96...
2002ALABAMABaldwin41.694.46107.93.1.848.36-0.61..
2003ALABAMABaldwin22.1541.694.46107.93.-0.473.97-0.79.
2004ALABAMABaldwin102.1322.1541.694.46107.933.611.4521.92-0.05
2005ALABAMABaldwin0.33102.1322.1541.694.46-1-0.99-0.99-0.93
2006ALABAMABaldwin0.160.33102.1322.1541.69-0.52-1-0.99-1
2007ALABAMABaldwin9.590.160.33102.1322.1559.5528.2-0.91-0.57
2008ALABAMABaldwin16.759.590.160.33102.130.75104.7650.01-0.84

 

Additionally, I would like to transpose my data in order to have a single variable named "Annual Change of Rate" to be one column, changing per year increase (from 2001-2017).

 

Here is my current transpose code:

Proc Transpose Data=Want out=Long prefix=Annual_rate_change ;
   	By State County;
	Bar  growth1yr growth2yr growth3yr growth4yr growth5yr growth6yr growth7yr growth8yr
	growth9yr growth10yr growth11yr growth12yr growth13yr growth14yr growth15yr growth16yr;
run;

Data Revised_Rate;
	Set Long;
	year=input(substr(_name_, 7), 7.);
   	drop _name_;
Run;

Proc Print Data= Revised_Rate (obs=10);
Run;

 

Here is my current output of the transpose code:

STATECOUNTYANNUAL_RATE_CHANGE1ANNUAL_RATE_CHANGE2ANNUAL_RATE_CHANGE3ANNUAL_RATE_CHANGE4ANNUAL_RATE_CHANGE5ANNUAL_RATE_CHANGE6ANNUAL_RATE_CHANGE7ANNUAL_RATE_CHANGE8ANNUAL_RATE_CHANGE9ANNUAL_RATE_CHANGE10ANNUAL_RATE_CHANGE11ANNUAL_RATE_CHANGE12ANNUAL_RATE_CHANGE13ANNUAL_RATE_CHANGE14ANNUAL_RATE_CHANGE15ANNUAL_RATE_CHANGE16ANNUAL_RATE_CHANGE17Year
ALABAMAAutauga.....2.7881...-0.657623.3834-0.93699......
ALABAMAAutauga..4.47282.0.72339.....0.5008-0.72382.57.7998....
ALABAMAAutauga.....5.5284..-0.6409..-0.90544.2.7048....
ALABAMAAutauga....8.43179...0.3604-0.87704...15.2396....
ALABAMAAutauga.....34.7286...-0.53423-0.461..4.5602....
ALABAMAAutauga........1.3445.1.0417-0.96604......
ALABAMAAutauga.........-0.19729.-0.87136......
ALABAMAAutauga........11.8308.2.5186..0.9968....
ALABAMAAutauga.........3.39306.-0.77831.6.564....
ALABAMAAutauga..........18.2565.......

 

 

Thank you for your help and I look forward to hearing from you! Please let me know if there is more I can provide to make my question more clear.

 

12 REPLIES 12
PaigeMiller
Diamond | Level 26

Additionally, I would like to transpose my data in order to have a single variable named "Annual Change of Rate" to be one column, changing per year increase (from 2001-2017).

 

For each year, as it stands, you have up to 16 different growth rates. (Most years will have fewer than 16, but more than 1). So, please show us the desired one column for rate of change that you would like. Where do the 16 different rates go to make this a single column?

 

Also, you may want to consider how to handle missing values such that when 2004 is not present for a county, it is still assumed to be there; your current code just assumes 2005 is one year after 2003, because there is no record for 2004.

--
Paige Miller
brianathrift
Fluorite | Level 6

Thank you Paige. I will need to consider the missing data, per county and per year.

 

And I am planning to have changes per year window (i.e.: 2001-2002, 2002-2003, 2003-2004). Is there a way to make this one column variable? Or will I have to have multiple columns for "Changes 2001-2002", "Changes 2002-2003", "Changes 2003-2004"...and so forth? 

 

 

unison
Lapis Lazuli | Level 10

One quick note is that you have "bar" instead of "var".

 

Proc Transpose Data=Want out=Long prefix=Annual_rate_change ;
   	By State County;
	Var  growth1yr growth2yr growth3yr growth4yr growth5yr growth6yr growth7yr growth8yr
	growth9yr growth10yr growth11yr growth12yr growth13yr growth14yr growth15yr growth16yr;
run;
-unison
unison
Lapis Lazuli | Level 10

Making adjustments to my response on your last post!

 

Data Ped_fatal;
Input State $ County $ Year Ped_rates ;
Datalines;
ALABAMA Autauga 2002 1.51
ALABAMA Autauga 2003 .
ALABAMA Autauga 2005 8.28
ALABAMA Autauga 2007 .
ALABAMA Autauga 2008 14.27
ALABAMA Autauga 2009 54.05
ALABAMA Autauga 2010 .
ALABAMA Autauga 2011 .
ALABAMA Autauga 2012 19.41
ALABAMA Autauga 2013 6.65
ALABAMA Autauga 2014 29.13
ALABAMA Autauga 2015 1.84
ALABAMA Autauga 2016 .
ALABAMA Autauga 2017 107.93
ALABAMA Baldwin 2001 4.46
ALABAMA Baldwin 2002 41.69
ALABAMA Baldwin 2003 22.15
ALABAMA Baldwin 2004 102.13
ALABAMA Baldwin 2005 0.33
ALABAMA Baldwin 2006 0.16
ALABAMA Baldwin 2007 9.59
ALABAMA Baldwin 2008 16.75
;
Run;

Proc Sort Data=Ped_fatal;
	By State County year;
Run;

proc transpose data=ped_fatal prefix=rate out=trx(drop=_name_);
	by state county;
	id year;
	var Ped_rates;
run;

proc sql noprint;
	select name into :all_yrs separated by " " from dictionary.columns where 
		libname='WORK' and memname='TRX' and name like "rate%" order by name;
	select count(distinct(name)) into :cnt_yrs from dictionary.columns where 
		libname='WORK' and memname='TRX' and name like "rate%" order by name;
quit;

data trx;
	retain state county &all_yrs.;
	set trx;
run;

data want(drop=i);
	set trx;
	array rate[&cnt_yrs.] rate:;

	array temp[&cnt_yrs.] _temporary_;
	do i=1 to dim(rate);
		if rate[i]=. then
			rate[i]=0;
		temp[i]=rate[i];
	end;
	do i=2 to dim(rate);
		rate[i]=temp[i]/temp[i-1]-1;
	end;
	rate[1]=0;
	format rate: COMMA.2;
run;

Finally you would merge these "rate" results with your original dataset if you wanted to show the Ped_rates in your output.

-unison
PaigeMiller
Diamond | Level 26

Yes, @unison, this is very good, the idea of transposing the original data and then you can use arrays rather than lag functions is something I thought of too. I'm still concerned about missing values here, and I'm not 100% sure I know what "single column" output is desired from the original question.

--
Paige Miller
brianathrift
Fluorite | Level 6

Hello,

 

Thank you all for your responses! I was hoping to have one single column of "annual rate changes", where the year prior would be the rate that the change was derived from. I was thinking this would allow me to see if the annual fatality rate increased, or decreased, per year, given the respective county of the fatalities. 

PaigeMiller
Diamond | Level 26

@brianathrift wrote:

Hello,

 

Thank you all for your responses! I was hoping to have one single column of "annual rate changes", where the year prior would be the rate that the change was derived from. I was thinking this would allow me to see if the annual fatality rate increased, or decreased, per year, given the respective county of the fatalities. 


SHOW US, please.

--
Paige Miller
brianathrift
Fluorite | Level 6

Sorry, this is an example of the output I am looking for. Where 'Changes in Rates' would be missing for the first year, since this is the first year and the baseline. Else, the 'changes in rates' would be missing if the county has not been mentioned yet, or if the county was not accounted for the year before. I would like the changes in rates to be from year-to-year, where the prior year is baseline. If the ped_rate the year prior is not available, then I will count the changes in rates as missing. I do not want to calculate changes in rates across various years, unless there is too many missing variables. Then I will keep what I have, and calculate 2001 as the baseline year for changes in rates.

 

YearStateCountyPed_rateChange in Rates
2001ALABAMABaldwin4.4554.
2001ALABAMABullock0.6903.
2001ALABAMACalhoun10.3939.
2001ALABAMAClarke4.7556.
2001ALABAMACoffee1.9288.
2001ALABAMAConecuh8.8044.
2001ALABAMACrenshaw9.4612.
2001ALABAMADallas1.7975.
2001ALABAMADekalb2.7568.
2001ALABAMAElmore8.2706.
2001ALABAMAHenry2.4877.
2001ALABAMAJackson12.7575.
2001ALABAMAJefferson43.4405.
2001ALABAMALauderdale13.988.
2001ALABAMALee35.1568.
2001ALABAMALowndes2.2875.
2001ALABAMAMacon1.8251.
2001ALABAMAMadison50.2152.
2001ALABAMAMarshall8.4402.
2001ALABAMAMobile26.9767.
2002ALABAMAAutauga1.5129.
2002ALABAMABaldwin41.68980.023986683
2002ALABAMACherokee2.6072.
2002ALABAMAChilton7.3703.
2002ALABAMADallas0.95381.048437828
2002ALABAMADekalb3.12520.319979521
2002ALABAMAGreene0.228.
2002ALABAMAHale3.7803.
2002ALABAMAJackson11.35590.088059951
2002ALABAMAJefferson13.76460.072650131
2002ALABAMALimestone2.3518.
2002ALABAMALowndes2.1397.
2002ALABAMAMadison14.00520.071402051
2002ALABAMAMarshall2.79640.357602632
2002ALABAMAMobile4.26890.234252384
2002ALABAMAMontgomery6.4434.
2002ALABAMAPike3.6759.
2002ALABAMARussell31.0583.
2002ALABAMAShelby1.5569.
2002ALABAMASt. Clair0.6259.
2003ALABAMAAutauga..
2003ALABAMABaldwin22.14840.045149988
2003ALABAMABarbour..
2003ALABAMABlount8.9222.
2003ALABAMAButler..
2003ALABAMACalhoun6.6503.
2003ALABAMAChambers5.6468.
2003ALABAMAChilton..
2003ALABAMAClarke0.2592.
2003ALABAMACleburne12.486.
2003ALABAMACoffee3.6551.
2003ALABAMAColbert2.5004.
2003ALABAMAConecuh..
2003ALABAMACoosa8.7085.
2003ALABAMACovington3.1731.
2003ALABAMACrenshaw..
2003ALABAMACullman2.4942.
2003ALABAMADale9.60150.104150393
2003ALABAMADallas2.15090.464921661
2003ALABAMADeKalb..
PaigeMiller
Diamond | Level 26

Now that we can see the input data and the output desired, we can write code to get from one to the other (but I'm still not sure you have provided the desired output).

 

Data Ped_fatal;
Length State $54 County $54;
Input State$ County$ Year Ped_rates ;
Datalines;
ALABAMA	Autauga	2002	1.51
ALABAMA	Autauga	2003	.
ALABAMA	Autauga	2005	8.28
ALABAMA	Autauga	2007	.
ALABAMA	Autauga	2008	14.27
ALABAMA	Autauga	2009	54.05
ALABAMA	Autauga	2010	.
ALABAMA	Autauga	2011	.
ALABAMA	Autauga	2012	19.41
ALABAMA	Autauga	2013	6.65
ALABAMA	Autauga	2014	29.13
ALABAMA	Autauga	2015	1.84
ALABAMA	Autauga	2016	.
ALABAMA	Autauga	2017	107.93
ALABAMA	Baldwin	2001	4.46
ALABAMA	Baldwin	2002	41.69
ALABAMA	Baldwin	2003	22.15
ALABAMA	Baldwin	2004	102.13
ALABAMA	Baldwin	2005	0.33
ALABAMA	Baldwin	2006	0.16
ALABAMA	Baldwin	2007	9.59
ALABAMA	Baldwin	2008	16.75

;
Run;

data all_years;
    do year=2001 to 2017;
        output;
	end;
run;
proc sql;
	create table all_counties as select distinct state,county from ped_fatal;
	create table all_counties_years as select * from all_counties,all_years
              order by state,county,year;
quit;
data ped_fatal2;
	merge ped_fatal(in=in1) all_counties_years(in=in2);
	by state county year;
	if in2 and not in1 then ped_rates=.;
	prev_rate=lag(ped_rates);
	if not first.county then change = (ped_rates-prev_rate)/prev_rate;
	drop prev_rate;
run;

Although, this part is still not clear to me

I do not want to calculate changes in rates across various years, unless there is too many missing variables. Then I will keep what I have, and calculate 2001 as the baseline year for changes in rates.

 

and it seems like the changes you are computing are still unrelated to the input data, and this continues to be a problem. Can't you take the Autauga and Baldwin raw data only, and provide the exact calculated change rate (do the calculations yourself) for Autauga and Baldwin only and show us those exact calculated change rates? 

--
Paige Miller
mkeintz
PROC Star

Your example data not only has missing ped_rates for some years, but in some cases also has missing annual records entirely (i.e. Autuga 2004 is not present at all).  This means that your lag values will not be consistent.  Sometimes LR4 will refer to 4 years ago, and sometimes it will refer to more than 4 years ago.  Is this intended?  Or do you want to make these variables consistent over your entire time range?

 

 

 

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
brianathrift
Fluorite | Level 6

Hello!

 

Thank you for your response. 

 

This dataset is difficult, as I integrated several datasets across the counties and years. Therefore, there is data available for some years, and not others. I am considering running multiple imputations, or running a likewise deletion in order to have a more complete dataset where the dependent variable (pedestrian fatality rates) is available.

mkeintz
PROC Star

The issue, however, is whether you want LR4, etc. to have consistent meanings.

 

Let's assume you do, but let's assume also you don't want to put in synthetic records for the missing years.

 

Then I also agree that arrays are the way to go, but I think this can all be done in one data step, as long as the data are already sorted by state/county/year:

 

data Ped_fatal;
Length State $54 County $54;
Input State$ County$ Year Ped_rates ;
Datalines;
ALABAMA	Autauga	2002	1.51
ALABAMA	Autauga	2003	.
ALABAMA	Autauga	2005	8.28
ALABAMA	Autauga	2007	.
ALABAMA	Autauga	2008	14.27
ALABAMA	Autauga	2009	54.05
ALABAMA	Autauga	2010	.
ALABAMA	Autauga	2011	.
ALABAMA	Autauga	2012	19.41
ALABAMA	Autauga	2013	6.65
ALABAMA	Autauga	2014	29.13
ALABAMA	Autauga	2015	1.84
ALABAMA	Autauga	2016	.
ALABAMA	Autauga	2017	107.93
ALABAMA	Baldwin	2001	4.46
ALABAMA	Baldwin	2002	41.69
ALABAMA	Baldwin	2003	22.15
ALABAMA	Baldwin	2004	102.13
ALABAMA	Baldwin	2005	0.33
ALABAMA	Baldwin	2006	0.16
ALABAMA	Baldwin	2007	9.59
ALABAMA	Baldwin	2008	16.75
Run;

data want (drop=_:);
  set ped_fatal;
  by state county;
  retain _begyear;   /* The earliest year encountered for current county*/

  array history {2001:2017} _temporary_;  /*Current county history*/

  array LR {16};    /* Assuming you want up to 16 lagged PED_RATES */
  array GRWTH {16}; /* And also 16 growth rates */

  if first.county then do;
    _begyear=year;
    call missing(of history{*});
  end;

  history{year}=ped_rates;

  do _Lag=1 to year-_begyear;
    LR{_lag}   = history{year-_LAG};
    if nmiss(ped_rates,LR{_lag})=0 then GRWTH{_lag}= (ped_rates-LR{_lag})/LR{_lag};
  end;
run;

 

  1. Make sure the HISTORY array has array bounds that cover the entire date range of your data.
  2.  Make sure that the LR and GRWTH arrays have enough elements to corresponding with your date range (i.e. 1 less than the number of years in your data set).

edited addition: I'm not sure this generates your "one column" request, but I would need example desired output to understand correctly.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 3622 views
  • 3 likes
  • 4 in conversation