Solved: Help with using two arrays please

robulon · Posted 02-09-2022 11:22 AM

Hi, I feel like I should know how to do this but am struggling to get my head around it. I have created a dataset that contains the cpi for the last six years (from 2016 - 2021). What I want to do is for each year, multiply the starting point by the figure for the subsequent years and ending up with what the total cpi for each starting year would be in 2022. So for 2016, I would want 2016 * 2017 * 2018 * 2019 * 2020 * 2021, whist for 2020, I would only want 2020 * 2021.

I have been able to write code that does this but I just feel like there has to be a cleverer way of doing it, which is why I'm asking all you clever people!

This is my existing code: -

data annual_cpi_01;
input year annual_cpi;
datalines;
2016 1.007
2017 1.027
2018 1.025
2019 1.018
2020 1.009
2021 1.026
;
run;

data annual_cpi_02;
set annual_cpi_01;
retain _2016 _2017 _2018 _2019 _2020;

if _n_ = 1 then _2016 = annual_cpi;
else _2016 = _2016 * annual_cpi;

if _n_ = 2 then _2017 = annual_cpi;
else _2017 = _2017 * annual_cpi;

if _n_ = 3 then _2018 = annual_cpi;
else _2018 = _2018 * annual_cpi;

if _n_ = 4 then _2019 = annual_cpi;
else _2019 = _2019 * annual_cpi;

if _n_ = 5 then _2020 = annual_cpi;
else _2020 = _2020 * annual_cpi;
run;

If anyone could suggest an alternative, I'd really appreciate it.

Many thanks,

Rob

Oligolas · Posted 02-09-2022 11:35 AM

I'd just use retain combined with a reversed sorting:

PROC SORT DATA=annual_cpi_01; by descending year; RUN;

data want;
   set annual_cpi_01;
   by descending year;
   retain total_cpi .;
   if total_cpi=. then total_cpi=annual_cpi;
   else if annual_cpi^=. then total_cpi=total_cpi*annual_cpi;
run;

________________________
- Cheers -

View solution in original post

robulon · Posted 02-09-2022 11:23 AM

Sorry, the reason for the title mentioning arrays is because I thought arrays might be the way forward but wasn't sure how I could get it to cycle through the different options when one would be going from 1 to 5 and the other would be going from 2016 - 2020. The other reason I thought arrays would work better is because I thought I could future proof it a bit more and not have to add lines of code for each new years worth of data.

Thanks again

Oligolas · Posted 02-09-2022 11:35 AM

I'd just use retain combined with a reversed sorting:

PROC SORT DATA=annual_cpi_01; by descending year; RUN;

data want;
   set annual_cpi_01;
   by descending year;
   retain total_cpi .;
   if total_cpi=. then total_cpi=annual_cpi;
   else if annual_cpi^=. then total_cpi=total_cpi*annual_cpi;
run;

________________________
- Cheers -

Astounding · Posted 02-09-2022 12:20 PM

Good approach. I would suggest a little tweak and a little simplification:

data want;
   set annual_cpi_01;
   by descending year;
   retain total_cpi  1;
   if annual_cpi ne . then total_cpi = total_cpi * annual_cpi;
run;

robulon · Posted 02-10-2022 04:05 AM

Brilliant, thank you

robulon · Posted 02-10-2022 04:05 AM

Thank you, this is great. For some reason, I had in my head I needed to have the values through each interim year which is what my code produces but the reality is I only need to know the impact now and this does exactly what I need. Thanks again

mkeintz · Posted 02-09-2022 12:54 PM

You don't really need two arrays. You can use the same array twice, once to record the annual cpi's, the second time to update those cpi's prior to output.

The program does require two passes through the data, but avoids a sort step:

data annual_cpi_01;
input year annual_cpi;
datalines;
2016 1.007
2017 1.027
2018 1.025
2019 1.018
2020 1.009
2021 1.026
run;

data want (drop=_:);
  array cpi_array  {2016:2021} _temporary_;
  set annual_cpi_01 (in=firstpass) annual_cpi_01 (in=secondpass);

  if firstpass then cpi_array{year}=annual_cpi ;

  if secondpass;

  if year=2016 then do _y_=2020 to 2016 by -1;
    cpi_array{_Y_}=  cpi_array{_Y_}* cpi_array{_Y_+1};
  end;
  total=cpi_array{year};
run;

The point of the CPI_ARRAY array is that (1) it is directly indexed by YEAR, and (2) it is temporary. The latter point means that the values in the array are retained while all observations are read (and reread), and that those values are not stored in a variable that will be automatically output. i.e. no extra variables.

The first pass of the data just populates the CPI_ARRAY. Then, at the beginning of the second pass (i.e. the subsetting IF SECONDPASS statement has been satisfied, and the year=2016) the CPI_ARRAY is updated according to your specification. Finally, as the second pass of data is processed total_cpi is retrieved from the updated array.

This program assumes data are sorted by ascending YEAR. Actually, the data don't have to be sorted by year, but it does require that the first observation is the earliest year. Other years can be in any order.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

robulon · Posted 02-10-2022 04:06 AM

Thanks for this, a really interesting solution, and not something I've seen done before in terms of explicitly making two passes through the data - you've definitely given me something to investigate further. Thanks again

Help with using two arrays please

Re: Help with using two arrays please

Re: Help with using two arrays please

Re: Help with using two arrays please

Re: Help with using two arrays please

Re: Help with using two arrays please

Re: Help with using two arrays please

Re: Help with using two arrays please

Re: Help with using two arrays please

Registration is open

SAS Training: Just a Click Away