Hi, I feel like I should know how to do this but am struggling to get my head around it. I have created a dataset that contains the cpi for the last six years (from 2016 - 2021). What I want to do is for each year, multiply the starting point by the figure for the subsequent years and ending up with what the total cpi for each starting year would be in 2022. So for 2016, I would want 2016 * 2017 * 2018 * 2019 * 2020 * 2021, whist for 2020, I would only want 2020 * 2021.
I have been able to write code that does this but I just feel like there has to be a cleverer way of doing it, which is why I'm asking all you clever people!
This is my existing code: -
data annual_cpi_01;
input year annual_cpi;
datalines;
2016 1.007
2017 1.027
2018 1.025
2019 1.018
2020 1.009
2021 1.026
;
run;
data annual_cpi_02;
set annual_cpi_01;
retain _2016 _2017 _2018 _2019 _2020;
if _n_ = 1 then _2016 = annual_cpi;
else _2016 = _2016 * annual_cpi;
if _n_ = 2 then _2017 = annual_cpi;
else _2017 = _2017 * annual_cpi;
if _n_ = 3 then _2018 = annual_cpi;
else _2018 = _2018 * annual_cpi;
if _n_ = 4 then _2019 = annual_cpi;
else _2019 = _2019 * annual_cpi;
if _n_ = 5 then _2020 = annual_cpi;
else _2020 = _2020 * annual_cpi;
run;
If anyone could suggest an alternative, I'd really appreciate it.
Many thanks,
Rob
I'd just use retain combined with a reversed sorting:
PROC SORT DATA=annual_cpi_01; by descending year; RUN;
data want;
set annual_cpi_01;
by descending year;
retain total_cpi .;
if total_cpi=. then total_cpi=annual_cpi;
else if annual_cpi^=. then total_cpi=total_cpi*annual_cpi;
run;
- Cheers -
I'd just use retain combined with a reversed sorting:
PROC SORT DATA=annual_cpi_01; by descending year; RUN;
data want;
set annual_cpi_01;
by descending year;
retain total_cpi .;
if total_cpi=. then total_cpi=annual_cpi;
else if annual_cpi^=. then total_cpi=total_cpi*annual_cpi;
run;
- Cheers -
Good approach. I would suggest a little tweak and a little simplification:
data want;
set annual_cpi_01;
by descending year;
retain total_cpi 1;
if annual_cpi ne . then total_cpi = total_cpi * annual_cpi;
run;
You don't really need two arrays. You can use the same array twice, once to record the annual cpi's, the second time to update those cpi's prior to output.
The program does require two passes through the data, but avoids a sort step:
data annual_cpi_01;
input year annual_cpi;
datalines;
2016 1.007
2017 1.027
2018 1.025
2019 1.018
2020 1.009
2021 1.026
run;
data want (drop=_:);
array cpi_array {2016:2021} _temporary_;
set annual_cpi_01 (in=firstpass) annual_cpi_01 (in=secondpass);
if firstpass then cpi_array{year}=annual_cpi ;
if secondpass;
if year=2016 then do _y_=2020 to 2016 by -1;
cpi_array{_Y_}= cpi_array{_Y_}* cpi_array{_Y_+1};
end;
total=cpi_array{year};
run;
The point of the CPI_ARRAY array is that (1) it is directly indexed by YEAR, and (2) it is temporary. The latter point means that the values in the array are retained while all observations are read (and reread), and that those values are not stored in a variable that will be automatically output. i.e. no extra variables.
The first pass of the data just populates the CPI_ARRAY. Then, at the beginning of the second pass (i.e. the subsetting IF SECONDPASS statement has been satisfied, and the year=2016) the CPI_ARRAY is updated according to your specification. Finally, as the second pass of data is processed total_cpi is retrieved from the updated array.
This program assumes data are sorted by ascending YEAR. Actually, the data don't have to be sorted by year, but it does require that the first observation is the earliest year. Other years can be in any order.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.