Re: Data presentation using SAS

Cruise · Posted 02-11-2018 06:52 PM

Human mobility affects disease estimates. My goal is to estimate the bias introduced to incidence rate of Alzheimer due to mobility. My research question can also be rephrased as: How much higher (or lower) would the Alzheimer's risk need be to among movers for a meaningful impact on its incidence rate.
If I solve this problem for one county as stated below then the roadmap could be used for an entire state.

A county with a population of 304,204 has migration of 18,768 people moved in (inflow) and 15,548 people moved out (outflow). Human mobility affects incidence rate because cases can be lost or gained from migration. Even though, we don't know how many cases were gained or lost among people moved in (inflow) and or people moved out (outflow), we can use practical assumptions for Alzheimer's risk among movers in three different scenarios delineated below. Alzheimer's state-wide age specific risk is known. Expected cases among movers are calculated as the product of risk and numbers of inflow and outflow.

Scenario 1. people who moved in (inflow) had state-wide risk while people who moved out had 0-2 times the state-wide risk (risk*0.1, risk*0.2....-risk*1.9, risk*2)

rate_i=(total cases+inflowstaterisk-outflowstaterisk*n_i)/(population+inflow-outflow)

risk difference then can be calculated as: di=arate-rate_i

where _i (subscript i) takes value of 0.1 through 2 by 0.1 increments arate is crude rate not accounting for mobility

Scenario 2. People who moved out had state-wide risk while people who moved in had 0-2 times the state-wide risk (risk*0.1, risk*0.2....-risk*1.9, risk*2)

rate_o=(total cases-outflowstaterisk+inflowstaterisk*n_i)/(population+inflow-outflow)

risk difference then can be calculated as:d_o=arate-rate_o arate is crude rate not accounting for mobility

Scenario 3:people who moved in and moved out had 0-2 times the state-wide risk (risk*0.1, risk*0.2....-risk*1.9, risk*2)

rate_p=(total cases-outflowstateriskn_i +inflowstateriskn_i) /(population+inflow-outflow)

risk difference then can be calculated as:d_p=arate-rate_p

My questions for your guys:

Does plot attached make sense and informative?
To answer my research question stated above, shall I look if 95%CI overlap between rates? What extent of change in rates due to mobility could be considered to be "meaningful impact". Please brainstorm here, if you will.

data data;
input county agegroup US_age_dist arate population tot_cases inflow outflow staterisk;
cards;
1    1    0.055316 7.23746E-6    15286    2    691    603    0.0000128886
1    2    0.21773 0.0000189259    57522    5    4962    3309    0.000129332
1    3    0.066478 0.000011864    28017    5    4770    2960    0.0000288517
1    4    0.06453 0.0000416917    21669    14    2387    2483    0.0000436222
1    5    0.071045 0.0000875479    17853    22    1463    1622    0.0000755209
1    6    0.080762 0.000107216    17325    23    902    1061    0.000127927
1    7    0.081852 0.000185157    19893    45    658    768    0.00020995
1    8    0.072117 0.000299466    22637    94    643    598    0.00028955
1    9    0.062718 0.000399072    22788    145    579    577    0.000392299
1    10   0.048454 0.000391896    21019    170    386    570    0.000436333
1    11   0.038794 0.000509854    17881    235    373    217    0.000495289
1    12   0.034264 0.000649886    11968    227    163    234    0.000599819
1    13   0.031772 0.00072508    8676    198    228    162    0.000672988
1    14   0.06035 0.001489936    21670    535    563    384    0.004367544
;
data data; set data;
arate1=US_age_dist*(tot_cases/population);
run;

/*people moved in (inflow) has statewide risk while people moved out (outflow) has statewide risk*ntimes risk*/
data temp_out;
  set data;
  array rate rate01-rate20;
  do i=1 to dim(rate);
    rate(i)=(tot_cases-outflow*staterisk*(i/10)+inflow*staterisk)/(population+inflow-outflow);
  end;
  drop i;
run;

/*people moved out (outflow) has statewide risk while people moved in (inflow) statewide risk*ntimes risk*/
data temp_in;
  set data;
  array rate rate01-rate20;
  do i=1 to dim(rate);
    rate(i)=(tot_cases+inflow*staterisk*(i/10)-outflow*staterisk)/(population+inflow-outflow);
  end;
  drop i;
run;

proc means data=temp_out noprint;
var rate:;
output out=temp_out1(drop=_type_ _freq_) sum=/autoname;
run;
proc means data=temp_in noprint;
var rate:;
output out=temp_in1(drop=_type_ _freq_) sum=/autoname;
run;

data temp_out2(keep=change:); set temp_out1;
change01=(rate10_Sum-rate01_Sum)/rate01_Sum*100;
change02=(rate10_Sum-rate02_Sum)/rate02_Sum*100;
change03=(rate10_Sum-rate03_Sum)/rate03_Sum*100;
change04=(rate10_Sum-rate04_Sum)/rate04_Sum*100;
change05=(rate10_Sum-rate05_Sum)/rate05_Sum*100;
change06=(rate10_Sum-rate06_Sum)/rate06_Sum*100;
change07=(rate10_Sum-rate07_Sum)/rate07_Sum*100;
change08=(rate10_Sum-rate08_Sum)/rate08_Sum*100;
change09=(rate10_Sum-rate09_Sum)/rate09_Sum*100;
change10=(rate10_Sum-rate10_Sum)/rate10_Sum*100;
change11=(rate10_Sum-rate11_Sum)/rate11_Sum*100;
change12=(rate10_Sum-rate12_Sum)/rate12_Sum*100;
change13=(rate10_Sum-rate13_Sum)/rate13_Sum*100;
change14=(rate10_Sum-rate14_Sum)/rate14_Sum*100;
change15=(rate10_Sum-rate15_Sum)/rate15_Sum*100;
change16=(rate10_Sum-rate16_Sum)/rate16_Sum*100;
change17=(rate10_Sum-rate17_Sum)/rate17_Sum*100;
change18=(rate10_Sum-rate18_Sum)/rate18_Sum*100;
change19=(rate10_Sum-rate19_Sum)/rate19_Sum*100;
change20=(rate10_Sum-rate20_Sum)/rate20_Sum*100;
run;

data temp_in2(keep=change:); set temp_in1;
change01=(rate10_Sum-rate01_Sum)/rate01_Sum*100;
change02=(rate10_Sum-rate02_Sum)/rate02_Sum*100;
change03=(rate10_Sum-rate03_Sum)/rate03_Sum*100;
change04=(rate10_Sum-rate04_Sum)/rate04_Sum*100;
change05=(rate10_Sum-rate05_Sum)/rate05_Sum*100;
change06=(rate10_Sum-rate06_Sum)/rate06_Sum*100;
change07=(rate10_Sum-rate07_Sum)/rate07_Sum*100;
change08=(rate10_Sum-rate08_Sum)/rate08_Sum*100;
change09=(rate10_Sum-rate09_Sum)/rate09_Sum*100;
change10=(rate10_Sum-rate10_Sum)/rate10_Sum*100;
change11=(rate10_Sum-rate11_Sum)/rate11_Sum*100;
change12=(rate10_Sum-rate12_Sum)/rate12_Sum*100;
change13=(rate10_Sum-rate13_Sum)/rate13_Sum*100;
change14=(rate10_Sum-rate14_Sum)/rate14_Sum*100;
change15=(rate10_Sum-rate15_Sum)/rate15_Sum*100;
change16=(rate10_Sum-rate16_Sum)/rate16_Sum*100;
change17=(rate10_Sum-rate17_Sum)/rate17_Sum*100;
change18=(rate10_Sum-rate18_Sum)/rate18_Sum*100;
change19=(rate10_Sum-rate19_Sum)/rate19_Sum*100;
change20=(rate10_Sum-rate20_Sum)/rate20_Sum*100;
run;

proc transpose data=temp_in2 out=temp_in3(rename=(COL1=change_in)); run;
proc transpose data=temp_out2 out=temp_out3(rename=(COL1=change_out)); run;

/*SCENARIO 3*/
data temp_both;
  set data;
  array rate rate01-rate20;
  do i=1 to dim(rate);
    rate(i)=(tot_cases-outflow*staterisk*(i/10)+inflow*staterisk*(i/10))/(population+inflow-outflow);
  end;
  drop i;
run;
proc means data=temp_both noprint;
var rate:;
output out=temp_both1(drop=_type_ _freq_) sum=/autoname;
run;
proc means data=temp_both1 noprint;
var rate:;
output out=temp_both2(drop=_type_ _freq_) sum=/autoname;
run;

data temp_both3(keep=change:); set temp_both2;
change01=(rate10_Sum_Sum-rate01_Sum_Sum)/rate01_Sum_Sum*100;
change02=(rate10_Sum_Sum-rate02_Sum_Sum)/rate02_Sum_Sum*100;
change03=(rate10_Sum_Sum-rate03_Sum_Sum)/rate03_Sum_Sum*100;
change04=(rate10_Sum_Sum-rate04_Sum_Sum)/rate04_Sum_Sum*100;
change05=(rate10_Sum_Sum-rate05_Sum_Sum)/rate05_Sum_Sum*100;
change06=(rate10_Sum_Sum-rate06_Sum_Sum)/rate06_Sum_Sum*100;
change07=(rate10_Sum_Sum-rate07_Sum_Sum)/rate07_Sum_Sum*100;
change08=(rate10_Sum_Sum-rate08_Sum_Sum)/rate08_Sum_Sum*100;
change09=(rate10_Sum_Sum-rate09_Sum_Sum)/rate09_Sum_Sum*100;
change10=(rate10_Sum_Sum-rate10_Sum_Sum)/rate10_Sum_Sum*100;
change11=(rate10_Sum_Sum-rate11_Sum_Sum)/rate11_Sum_Sum*100;
change12=(rate10_Sum_Sum-rate12_Sum_Sum)/rate12_Sum_Sum*100;
change13=(rate10_Sum_Sum-rate13_Sum_Sum)/rate13_Sum_Sum*100;
change14=(rate10_Sum_Sum-rate14_Sum_Sum)/rate14_Sum_Sum*100;
change15=(rate10_Sum_Sum-rate15_Sum_Sum)/rate15_Sum_Sum*100;
change16=(rate10_Sum_Sum-rate16_Sum_Sum)/rate16_Sum_Sum*100;
change17=(rate10_Sum_Sum-rate17_Sum_Sum)/rate17_Sum_Sum*100;
change18=(rate10_Sum_Sum-rate18_Sum_Sum)/rate18_Sum_Sum*100;
change19=(rate10_Sum_Sum-rate19_Sum_Sum)/rate19_Sum_Sum*100;
change20=(rate10_Sum_Sum-rate20_Sum_Sum)/rate20_Sum_Sum*100;
run;

proc transpose data=temp_both3 out=temp_both4(rename=(COL1=change_both)); run;

data all;
merge temp_out3 temp_in3 temp_both4;
risk+0.1;
run;

proc sgplot data=all nocycleattrs;
title 'EFFECT OF DIFFERENTIAL RISK AMONG MOVERS ON ALZHEIMER INCIDENCE RATE';
  series x=risk y=change_in/ lineattrs=(color=blue) legendlabel= 'SCENARIO 1';
  series x=risk y=change_out/ lineattrs=(color=red) legendlabel= 'SCENARIO 2';
  series x=risk y=change_both/ lineattrs=(color=black) legendlabel= 'SCENARIO 3';
  yaxis label='Percent of Change in Incidence Rate';
  xaxis label='Alzheimer Risk';

ChrisNZ · Posted 02-11-2018 11:41 PM

Your program generates errors.

High-Performance SAS Coding - Third Edition

Cruise · Posted 02-12-2018 12:03 AM

hey thanks a lot. just corrected.

ChrisNZ · Posted 02-12-2018 08:16 PM

Your program is now a one-liner.

High-Performance SAS Coding - Third Edition

Cruise · Posted 02-12-2018 08:55 PM

Thanks again. Just corrected.

ballardw · Posted 02-13-2018 01:04 PM

Can you describe clearly why you have a loop of 1 to 20 and the *(i/10) in the temp_out code?

I am not sure that I quite understand what the "rate" variable is supposed to actually contain. I see that you say"

Scenario 1. people who moved in (inflow) had state-wide risk while people who moved out had 0-2 times the state-wide risk (risk*0.1, risk*0.2....-risk*1.9, risk*2)

but that doesn't really make sense. Why restrict to 0-2 and not 0-3 or 0-1.5 or .5 to 2.5? What assumptions is that 0-2 interval based on? Are those your range of biases?

I really don't understand what summing those rates across age groups represents.

If your arate variable contains a risk at age(?) what role does it play? It seems to be ignored.

Since the resultant graph is 3 straight lines I think you can skip all of the 1 to 20 bits and just do the end points.

Cruise · Posted 02-13-2018 03:56 PM

@ballardw

Thank you very much ballardw for your time!

This work is to explore whether population mobility distorts the incidence rate significantly? If it does, then what is the extent of the impact? Quantify the impact and inspect whether observed impact would substantially distort the accuracy of the disease estimates. In current practice, incidence rate is calculated as (number of new cases)/(census population )*100,000. Neither nominator nor denominator is updated with number of cases potentially lost or gained due to mobility. Ideally, information shall be available on number of new cases moved into the county (any geographic area unit) and moved out of the county due to mobility. So then mobility adjusted rate would be: (number of new cases-cases left+cases moved in)/(census population-people moved out+people moved in)*100,000. However, this information is not available in US. Using primary data approaching to individuals for actual migration history is costly and subject to recall bias and willing to participate et.c. Primary data based studies do exist but it suffers from lack of generalizability above aforementioned limitations. This study uses Census County to County Migration flow. It tells us the number of people moved in and moved out of the county by age (other demographics available but I’d like to solve the problem using age first). This info updates the denominator (census population-people moved out+people moved in). However, I still have no idea how many actual cases would be gained or lost due to mobility unless I use assumptions. Mobility is multi-directional but I stick to only two opposite directions and assume their move permeant. Data is organized by age groups because either health outcome and mobility is age sensitive/dependent.

Primary goal of this exercise is to create a scaled impact of mobility as a function of risk of disease of interest among movers. Disease of my interest here is Alzheimer. Average state risk is around 500 per 100,000. Because mobility is bi-directional, in and out movements reduce the net effect. If I’m interested in creating the extreme effects due to mobility then I’d rather fix one direction while other direction is active. That’s why I held risk among inflow constant to state-risk while risk among outflow varied for Scenario 1 and 2.

Scale of risk range: The range is chosen 0-2 in scale. First, it’s my arbitrary choice to double the risk (2) and reduce the risk to half (0.5) but expanded to 0. Second, it’s extremely unlikely that Alzheimer’s risk among movers would magically double or two times less the state average risk. Unless the area of study is deemed the least desirable once one is diagnosed with Alzheimer which is not really the case for NYS. The state doesn’t either do anything that extraordinary to attract Alzheimer patients for conducive care et.c.

Summing rates across age groups: Sorry for ambiguity. Arate in the SAS program is the product of Age Distribution of Standard US Population 2000 and crude rate for each age group for age adjusted rate. Both Alzheimer and migration are age sensitive in nature. So age adjustment here is important.

Edited in SAS program above:

data data; set data;

arate1=US_age_dist*(tot_cases/population);

run;

Take end points of the plot: I would like to present extreme ends of impact of mobility on the disease estimates. However, I shall still look out for more effective and meaningful ways to visualize the results.

My question remains, What extent of change in rates due to mobility could be "meaningful impact". Shall I look at ratio of 95%CI of rates from different scenarios?

Data presentation using SAS