BookmarkSubscribeRSS Feed
Kotebiya
Calcite | Level 5

 

Hello everyone, I am asking about using PROC MCMC to simulate possible outcomes of populations in various US states. After doing some research, I was pointed to PROC MCMC as a way to account for multiple potential outcomes for population growth amongst states (e.g. some states growing faster than projected or slower than expected, which could be due to an economic slowdown or a major natural disaster). I am trying to compare the possible populations in 2020.

END - The natural logarithm of the projected population of a state in 2020.
END_VAR - The variance of END (based on population estimates).
STATE[INDEX] - The index number for each state.

Goal: A posterior dataset that contains simulations of the data each having 50 variables representing the projected population of the 50 states. Analysis of the posterior data would show that the mean estimate for each state would be close to the prior estimate for each state and that the standard deviation would be close to the prior provided variation for each state estimate.

 

The Data:

DATA POPEST.POPEST;
	INPUT INDEX END END_VAR;
CARDS;
1 15.4241325 0.0002367
2 13.5640134 0.0006933
3 15.8262773 0.0013665
4 14.9405672 0.0002576
5 17.5277513 0.0001028
6 15.5865207 0.0002908
7 15.1101493 0.0001562
8 13.8189535 0.0001989
9 16.8993923 0.0006576
10 16.2084906 0.0005570
11 14.2270479 0.0002234
12 14.3967189 0.0009461
13 16.3803101 0.0000703
14 15.7319626 0.0000578
15 14.9750436 0.0000653
16 14.9089526 0.0001230
17 15.3310353 0.0001038
18 15.3706757 0.0076526
19 14.1129090 0.0002326
20 15.6475029 0.0001247
21 15.7525401 0.0002435
22 16.1093723 0.0001705
23 15.5523285 0.0000302
24 14.9272146 0.0001557
25 15.6470303 0.0000951
26 13.8905054 0.0001742
27 14.4875277 0.0000478
28 14.9926327 0.0038334
29 14.1235860 0.0003578
30 16.0273223 0.0000774
31 14.5934536 0.0007836
32 16.8139648 0.0001331
33 16.1917873 0.0005469
34 13.5895981 0.0024562
35 16.2745662 0.0000071
36 15.2189476 0.0001872
37 15.2603779 0.0001952
38 16.3782608 0.0000426
39 13.8720971 0.0003982
40 15.4665904 0.0003467
41 13.7034732 0.0002322
42 15.7491440 0.0001953
43 17.2153554 0.0001408
44 15.0053915 0.0005280
45 13.3556029 0.0000774
46 15.9944187 0.0001168
47 15.8471508 0.0001357
48 14.4339610 0.0001527
49 15.5910714 0.0000591
50 13.3356746 0.0013045
;
RUN;

The MCMC Code:

ODS EXCLUDE ALL;
PROC MCMC DATA=POPEST.POPEST5 NBI=1000 THIN=10 NMC=10000000 SEED=17760704
		OUTPOST=POPEST.SIM_SMALL MONITOR=(_parms_ mu) DIAGNOSTICS=NONE
		PROPDIST=T;
	ARRAY STATE[50];
	PARMS STATE: 0;
	PARMS S2 1;
	PRIOR STATE: ~ NORMAL(0, VAR=S2);
	PRIOR S2: ~ IGAMMA(0.00001, SCALE=100000);
	mu = STATE[INDEX];
	MODEL END ~ NORMAL(mu, var=END_VAR);
RUN;
ODS EXCLUDE NONE;


I am making a few assumptions that were factored in to the data before running PROC MCMC. The END estimates were developed as the average annual exponential growth rate of a state between 2000-2001 to 2014-2015 and the prior distribution is based on the standard deviation of exponential growth in that time period for each state. I am assuming the growth rates are independent of each other.

What I am hoping to get out of this post is feedback from others who might have a better idea about this. Is there anything that I might be missing or any suggested changes? I don't need an elegant model, just one that includes many different outcomes.

 

2 REPLIES 2
Reeza
Super User

Not an answer to your question, but from a demographic perspective the population growth should also factor in several other components:

  1. Fertility rate
  2. Mortality rate
  3. Immigration rate

 

Not sure if all of these can be factored in via PROC MCMC.  

Kotebiya
Calcite | Level 5
I did try using exponential growth rate and the cohort-component method. The problem with the cohort-component method is that there is no reliable data to estimate or use as a reference for outmigration to a foreign country. Immigration data in general makes using the cohort-component unreasonable.

Besides, even if I did factor in fertility rate, mortality rate, and immigration rate individually, I would calculate the estimates and variance beforehand.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1280 views
  • 0 likes
  • 2 in conversation