Human mobility affects disease estimates. My goal is to estimate the bias introduced to incidence rate of Alzheimer due to mobility. My research question can also be rephrased as: How much higher (or lower) would the Alzheimer's risk need be to among movers for a meaningful impact on its incidence rate.
If I solve this problem for one county as stated below then the roadmap could be used for an entire state.
A county with a population of 304,204 has migration of 18,768 people moved in (inflow) and 15,548 people moved out (outflow). Human mobility affects incidence rate because cases can be lost or gained from migration. Even though, we don't know how many cases were gained or lost among people moved in (inflow) and or people moved out (outflow), we can use practical assumptions for Alzheimer's risk among movers in three different scenarios delineated below. Alzheimer's state-wide age specific risk is known. Expected cases among movers are calculated as the product of risk and numbers of inflow and outflow.
Scenario 1. people who moved in (inflow) had state-wide risk while people who moved out had 0-2 times the state-wide risk (risk*0.1, risk*0.2....-risk*1.9, risk*2)
rate_i=(total cases+inflowstaterisk-outflowstaterisk*n_i)/(population+inflow-outflow)
risk difference then can be calculated as: di=arate-rate_i
where _i (subscript i) takes value of 0.1 through 2 by 0.1 increments arate is crude rate not accounting for mobility
Scenario 2. People who moved out had state-wide risk while people who moved in had 0-2 times the state-wide risk (risk*0.1, risk*0.2....-risk*1.9, risk*2)
rate_o=(total cases-outflowstaterisk+inflowstaterisk*n_i)/(population+inflow-outflow)
risk difference then can be calculated as:d_o=arate-rate_o arate is crude rate not accounting for mobility
Scenario 3:people who moved in and moved out had 0-2 times the state-wide risk (risk*0.1, risk*0.2....-risk*1.9, risk*2)
rate_p=(total cases-outflowstateriskn_i +inflowstateriskn_i) /(population+inflow-outflow)
risk difference then can be calculated as:d_p=arate-rate_p
My questions for your guys:
data data;
input county agegroup US_age_dist arate population tot_cases inflow outflow staterisk;
cards;
1 1 0.055316 7.23746E-6 15286 2 691 603 0.0000128886
1 2 0.21773 0.0000189259 57522 5 4962 3309 0.000129332
1 3 0.066478 0.000011864 28017 5 4770 2960 0.0000288517
1 4 0.06453 0.0000416917 21669 14 2387 2483 0.0000436222
1 5 0.071045 0.0000875479 17853 22 1463 1622 0.0000755209
1 6 0.080762 0.000107216 17325 23 902 1061 0.000127927
1 7 0.081852 0.000185157 19893 45 658 768 0.00020995
1 8 0.072117 0.000299466 22637 94 643 598 0.00028955
1 9 0.062718 0.000399072 22788 145 579 577 0.000392299
1 10 0.048454 0.000391896 21019 170 386 570 0.000436333
1 11 0.038794 0.000509854 17881 235 373 217 0.000495289
1 12 0.034264 0.000649886 11968 227 163 234 0.000599819
1 13 0.031772 0.00072508 8676 198 228 162 0.000672988
1 14 0.06035 0.001489936 21670 535 563 384 0.004367544
;
data data; set data;
arate1=US_age_dist*(tot_cases/population);
run;
/*people moved in (inflow) has statewide risk while people moved out (outflow) has statewide risk*ntimes risk*/
data temp_out;
set data;
array rate rate01-rate20;
do i=1 to dim(rate);
rate(i)=(tot_cases-outflow*staterisk*(i/10)+inflow*staterisk)/(population+inflow-outflow);
end;
drop i;
run;
/*people moved out (outflow) has statewide risk while people moved in (inflow) statewide risk*ntimes risk*/
data temp_in;
set data;
array rate rate01-rate20;
do i=1 to dim(rate);
rate(i)=(tot_cases+inflow*staterisk*(i/10)-outflow*staterisk)/(population+inflow-outflow);
end;
drop i;
run;
proc means data=temp_out noprint;
var rate:;
output out=temp_out1(drop=_type_ _freq_) sum=/autoname;
run;
proc means data=temp_in noprint;
var rate:;
output out=temp_in1(drop=_type_ _freq_) sum=/autoname;
run;
data temp_out2(keep=change:); set temp_out1;
change01=(rate10_Sum-rate01_Sum)/rate01_Sum*100;
change02=(rate10_Sum-rate02_Sum)/rate02_Sum*100;
change03=(rate10_Sum-rate03_Sum)/rate03_Sum*100;
change04=(rate10_Sum-rate04_Sum)/rate04_Sum*100;
change05=(rate10_Sum-rate05_Sum)/rate05_Sum*100;
change06=(rate10_Sum-rate06_Sum)/rate06_Sum*100;
change07=(rate10_Sum-rate07_Sum)/rate07_Sum*100;
change08=(rate10_Sum-rate08_Sum)/rate08_Sum*100;
change09=(rate10_Sum-rate09_Sum)/rate09_Sum*100;
change10=(rate10_Sum-rate10_Sum)/rate10_Sum*100;
change11=(rate10_Sum-rate11_Sum)/rate11_Sum*100;
change12=(rate10_Sum-rate12_Sum)/rate12_Sum*100;
change13=(rate10_Sum-rate13_Sum)/rate13_Sum*100;
change14=(rate10_Sum-rate14_Sum)/rate14_Sum*100;
change15=(rate10_Sum-rate15_Sum)/rate15_Sum*100;
change16=(rate10_Sum-rate16_Sum)/rate16_Sum*100;
change17=(rate10_Sum-rate17_Sum)/rate17_Sum*100;
change18=(rate10_Sum-rate18_Sum)/rate18_Sum*100;
change19=(rate10_Sum-rate19_Sum)/rate19_Sum*100;
change20=(rate10_Sum-rate20_Sum)/rate20_Sum*100;
run;
data temp_in2(keep=change:); set temp_in1;
change01=(rate10_Sum-rate01_Sum)/rate01_Sum*100;
change02=(rate10_Sum-rate02_Sum)/rate02_Sum*100;
change03=(rate10_Sum-rate03_Sum)/rate03_Sum*100;
change04=(rate10_Sum-rate04_Sum)/rate04_Sum*100;
change05=(rate10_Sum-rate05_Sum)/rate05_Sum*100;
change06=(rate10_Sum-rate06_Sum)/rate06_Sum*100;
change07=(rate10_Sum-rate07_Sum)/rate07_Sum*100;
change08=(rate10_Sum-rate08_Sum)/rate08_Sum*100;
change09=(rate10_Sum-rate09_Sum)/rate09_Sum*100;
change10=(rate10_Sum-rate10_Sum)/rate10_Sum*100;
change11=(rate10_Sum-rate11_Sum)/rate11_Sum*100;
change12=(rate10_Sum-rate12_Sum)/rate12_Sum*100;
change13=(rate10_Sum-rate13_Sum)/rate13_Sum*100;
change14=(rate10_Sum-rate14_Sum)/rate14_Sum*100;
change15=(rate10_Sum-rate15_Sum)/rate15_Sum*100;
change16=(rate10_Sum-rate16_Sum)/rate16_Sum*100;
change17=(rate10_Sum-rate17_Sum)/rate17_Sum*100;
change18=(rate10_Sum-rate18_Sum)/rate18_Sum*100;
change19=(rate10_Sum-rate19_Sum)/rate19_Sum*100;
change20=(rate10_Sum-rate20_Sum)/rate20_Sum*100;
run;
proc transpose data=temp_in2 out=temp_in3(rename=(COL1=change_in)); run;
proc transpose data=temp_out2 out=temp_out3(rename=(COL1=change_out)); run;
/*SCENARIO 3*/
data temp_both;
set data;
array rate rate01-rate20;
do i=1 to dim(rate);
rate(i)=(tot_cases-outflow*staterisk*(i/10)+inflow*staterisk*(i/10))/(population+inflow-outflow);
end;
drop i;
run;
proc means data=temp_both noprint;
var rate:;
output out=temp_both1(drop=_type_ _freq_) sum=/autoname;
run;
proc means data=temp_both1 noprint;
var rate:;
output out=temp_both2(drop=_type_ _freq_) sum=/autoname;
run;
data temp_both3(keep=change:); set temp_both2;
change01=(rate10_Sum_Sum-rate01_Sum_Sum)/rate01_Sum_Sum*100;
change02=(rate10_Sum_Sum-rate02_Sum_Sum)/rate02_Sum_Sum*100;
change03=(rate10_Sum_Sum-rate03_Sum_Sum)/rate03_Sum_Sum*100;
change04=(rate10_Sum_Sum-rate04_Sum_Sum)/rate04_Sum_Sum*100;
change05=(rate10_Sum_Sum-rate05_Sum_Sum)/rate05_Sum_Sum*100;
change06=(rate10_Sum_Sum-rate06_Sum_Sum)/rate06_Sum_Sum*100;
change07=(rate10_Sum_Sum-rate07_Sum_Sum)/rate07_Sum_Sum*100;
change08=(rate10_Sum_Sum-rate08_Sum_Sum)/rate08_Sum_Sum*100;
change09=(rate10_Sum_Sum-rate09_Sum_Sum)/rate09_Sum_Sum*100;
change10=(rate10_Sum_Sum-rate10_Sum_Sum)/rate10_Sum_Sum*100;
change11=(rate10_Sum_Sum-rate11_Sum_Sum)/rate11_Sum_Sum*100;
change12=(rate10_Sum_Sum-rate12_Sum_Sum)/rate12_Sum_Sum*100;
change13=(rate10_Sum_Sum-rate13_Sum_Sum)/rate13_Sum_Sum*100;
change14=(rate10_Sum_Sum-rate14_Sum_Sum)/rate14_Sum_Sum*100;
change15=(rate10_Sum_Sum-rate15_Sum_Sum)/rate15_Sum_Sum*100;
change16=(rate10_Sum_Sum-rate16_Sum_Sum)/rate16_Sum_Sum*100;
change17=(rate10_Sum_Sum-rate17_Sum_Sum)/rate17_Sum_Sum*100;
change18=(rate10_Sum_Sum-rate18_Sum_Sum)/rate18_Sum_Sum*100;
change19=(rate10_Sum_Sum-rate19_Sum_Sum)/rate19_Sum_Sum*100;
change20=(rate10_Sum_Sum-rate20_Sum_Sum)/rate20_Sum_Sum*100;
run;
proc transpose data=temp_both3 out=temp_both4(rename=(COL1=change_both)); run;
data all;
merge temp_out3 temp_in3 temp_both4;
risk+0.1;
run;
proc sgplot data=all nocycleattrs;
title 'EFFECT OF DIFFERENTIAL RISK AMONG MOVERS ON ALZHEIMER INCIDENCE RATE';
series x=risk y=change_in/ lineattrs=(color=blue) legendlabel= 'SCENARIO 1';
series x=risk y=change_out/ lineattrs=(color=red) legendlabel= 'SCENARIO 2';
series x=risk y=change_both/ lineattrs=(color=black) legendlabel= 'SCENARIO 3';
yaxis label='Percent of Change in Incidence Rate';
xaxis label='Alzheimer Risk';
Your program generates errors.
Your program is now a one-liner.
Can you describe clearly why you have a loop of 1 to 20 and the *(i/10) in the temp_out code?
I am not sure that I quite understand what the "rate" variable is supposed to actually contain. I see that you say"
Scenario 1. people who moved in (inflow) had state-wide risk while people who moved out had 0-2 times the state-wide risk (risk*0.1, risk*0.2....-risk*1.9, risk*2)
but that doesn't really make sense. Why restrict to 0-2 and not 0-3 or 0-1.5 or .5 to 2.5? What assumptions is that 0-2 interval based on? Are those your range of biases?
I really don't understand what summing those rates across age groups represents.
If your arate variable contains a risk at age(?) what role does it play? It seems to be ignored.
Since the resultant graph is 3 straight lines I think you can skip all of the 1 to 20 bits and just do the end points.
Thank you very much ballardw for your time!
This work is to explore whether population mobility distorts the incidence rate significantly? If it does, then what is the extent of the impact? Quantify the impact and inspect whether observed impact would substantially distort the accuracy of the disease estimates. In current practice, incidence rate is calculated as (number of new cases)/(census population )*100,000. Neither nominator nor denominator is updated with number of cases potentially lost or gained due to mobility. Ideally, information shall be available on number of new cases moved into the county (any geographic area unit) and moved out of the county due to mobility. So then mobility adjusted rate would be: (number of new cases-cases left+cases moved in)/(census population-people moved out+people moved in)*100,000. However, this information is not available in US. Using primary data approaching to individuals for actual migration history is costly and subject to recall bias and willing to participate et.c. Primary data based studies do exist but it suffers from lack of generalizability above aforementioned limitations. This study uses Census County to County Migration flow. It tells us the number of people moved in and moved out of the county by age (other demographics available but I’d like to solve the problem using age first). This info updates the denominator (census population-people moved out+people moved in). However, I still have no idea how many actual cases would be gained or lost due to mobility unless I use assumptions. Mobility is multi-directional but I stick to only two opposite directions and assume their move permeant. Data is organized by age groups because either health outcome and mobility is age sensitive/dependent.
Primary goal of this exercise is to create a scaled impact of mobility as a function of risk of disease of interest among movers. Disease of my interest here is Alzheimer. Average state risk is around 500 per 100,000. Because mobility is bi-directional, in and out movements reduce the net effect. If I’m interested in creating the extreme effects due to mobility then I’d rather fix one direction while other direction is active. That’s why I held risk among inflow constant to state-risk while risk among outflow varied for Scenario 1 and 2.
Scale of risk range: The range is chosen 0-2 in scale. First, it’s my arbitrary choice to double the risk (2) and reduce the risk to half (0.5) but expanded to 0. Second, it’s extremely unlikely that Alzheimer’s risk among movers would magically double or two times less the state average risk. Unless the area of study is deemed the least desirable once one is diagnosed with Alzheimer which is not really the case for NYS. The state doesn’t either do anything that extraordinary to attract Alzheimer patients for conducive care et.c.
Summing rates across age groups: Sorry for ambiguity. Arate in the SAS program is the product of Age Distribution of Standard US Population 2000 and crude rate for each age group for age adjusted rate. Both Alzheimer and migration are age sensitive in nature. So age adjustment here is important.
Edited in SAS program above:
data data; set data;
arate1=US_age_dist*(tot_cases/population);
run;
Take end points of the plot: I would like to present extreme ends of impact of mobility on the disease estimates. However, I shall still look out for more effective and meaningful ways to visualize the results.
My question remains, What extent of change in rates due to mobility could be "meaningful impact". Shall I look at ratio of 95%CI of rates from different scenarios?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.