I am looking to populate a variable that has missing numbers for future years (say 2018 through 2030). What I have is a population number for 2017, and I have given growth percents for the future years. So, for 2018, I want to take the population value (say 800) from the last observation for year 2017 and multiply that by my growth percent that I have stored in the observation for 2018. I tried using a retain statement, and I was able to get this to work for the first year, but I am unable to get it to carry over for the remaining years.
My data structure is as follows:
year | population | growth_percent |
2017 | 800 | 1.01 |
2018 | x1 | 1.06 |
2019 | x2 | 1.08 |
2020 | x3 | 1.11 |
2021 | x4 | 1.5 |
2022 | x5 | 1 |
I need to dynamically calculate the numbers for x1 - x5.
X1 should equal 800 * 1.06. Then X2 should eqaul X1 * 1.08.
I have tried the following code, but I am only able to populate X1. I can't figure out how to pass the X1 value to the next observation in order to calculate X2:
data populations;
set pop;
retain lastpop;
output;
lastpop= (population) * (growth_percent);
run;
Much Thanks!
Assuming your data is in order, a simple program would be:
data want;
set have;
retain new_pop;
if year = 2017 then new_pop = population;
else new_pop = new_pop * growth_percent;
drop population;
rename new_pop = population;
run;
Kelly,
I recommend that you try to get that original number out of the x1, x2 etc. column. From there, you can multiply it against the growth percentage.
Assuming that it is always the least year that gives you the original population, you can do the following:
data have;
input year population $ growth_percent ;
FORMAT growth_percent 8.4;
cards;
2017 800 1.01
2018 x1 1.06
2019 x2 1.08
2020 x3 1.11
2021 x4 1.5
2022 x5 1
;
run;
PROC SQL;
CREATE TABLE MIN AS
SELECT INPUT(POPULATION, 8.) AS MIN_POP FORMAT=8. FROM HAVE WHERE YEAR IN (SELECT MIN(YEAR) FROM HAVE);
QUIT;
PROC SQL;
CREATE TABLE HAVE AS
SELECT *, MIN.MIN_POP * GROWTH_PERCENT AS TOTAL_GROWTH FROM HAVE, MIN;
QUIT;
There are many other ways to do this I am sure, but I prefer SQL because it may be used in many other statistical programs (e.g. R, SPSS, etc.)
I need to be able to pass the TOTAL_GROWTH # to the next observation:
2017: 800 * 1.01 = 808
2018: 808 * 1.06 = 856.48 (****not: 800 *1.06 = 848)
2019: 856.48 * 1.08 = 924.9984
and so on...
Assuming your data is in order, a simple program would be:
data want;
set have;
retain new_pop;
if year = 2017 then new_pop = population;
else new_pop = new_pop * growth_percent;
drop population;
rename new_pop = population;
run;
Thank you so much! This is exactly what I needed.
data want2 (drop=lastpop);
set have;
if pop=. then pop=lastpop*gr_pct;
retain lastpop;
lastpop=pop;
run;
This program assumes that the first observation always has a proper value for population. Also, anytime a new value for population is encountered in the input data step, the process start over.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.