- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I need to find an efficient way to calculate the median age from summary tables. The method is explained in the link below, and is pretty straightforward, but I'm looking for efficiency. I'm good at brute force, moreso than efficiency ... or at least brain dead at the moment.
The correct answer for the data below is 39.8. I was also wondering if there was a way to do this using a Stat proc, but an obvious method didn't occur to me.
Methodology in quesion and first answer.
http://stats.stackexchange.com/questions/139132/how-to-calculate-the-median-age-of-a-population
A quick description would be to find the two values that surround the median, find the corresponding ages that align with that number. Since it's on a scale the actual number is somewhere between the two numbers so weight it based on the difference betwen the two numbers. Since it's a continuous scale, add 1 to scale up.
Geography Age 2010 Cumulative
Canada 0 377861 377861
Canada 1 380538 758399
Canada 2 380634 1139033
Canada 3 371478 1510511
Canada 4 362882 1873393
Canada 5 356590 2229983
Canada 6 360976 2590959
Canada 7 355169 2946128
Canada 8 357366 3303494
Canada 9 361252 3664746
Canada 10 374141 4038887
Canada 11 376153 4415040
Canada 12 386812 4801852
Canada 13 400765 5202617
Canada 14 421855 5624472
Canada 15 434175 6058647
Canada 16 438635 6497282
Canada 17 446950 6944232
Canada 18 457629 7401861
Canada 19 472134 7873995
Canada 20 479938 8353933
Canada 21 465581 8819514
Canada 22 451825 9271339
Canada 23 457019 9728358
Canada 24 468109 10196467
Canada 25 472877 10669344
Canada 26 471323 11140667
Canada 27 470566 11611233
Canada 28 468454 12079687
Canada 29 474732 12554419
Canada 30 468396 13022815
Canada 31 457343 13480158
Canada 32 450990 13931148
Canada 33 452786 14383934
Canada 34 454337 14838271
Canada 35 456131 15294402
Canada 36 444761 15739163
Canada 37 447653 16186816
Canada 38 454974 16641790
Canada 39 479412 17121202
Canada 40 476941 17598143
Canada 41 473754 18071897
Canada 42 470960 18542857
Canada 43 476994 19019851
Canada 44 505091 19524942
Canada 45 543975 20068917
Canada 46 560528 20629445
Canada 47 565680 21195125
Canada 48 550871 21745996
Canada 49 559845 22305841
Canada 50 549137 22854978
Canada 51 535778 23390756
Canada 52 530407 23921163
Canada 53 518528 24439691
Canada 54 501147 24940838
Canada 55 496890 25437728
Canada 56 476058 25913786
Canada 57 452959 26366745
Canada 58 434140 26800885
Canada 59 426554 27227439
Canada 60 416427 27643866
Canada 61 406686 28050552
Canada 62 406030 28456582
Canada 63 409513 28866095
Canada 64 343036 29209131
Canada 65 319975 29529106
Canada 66 310757 29839863
Canada 67 297341 30137204
Canada 68 274022 30411226
Canada 69 262547 30673773
Canada 70 243273 30917046
Canada 71 233870 31150916
Canada 72 222439 31373355
Canada 73 209795 31583150
Canada 74 205522 31788672
Canada 75 196095 31984767
Canada 76 185698 32170465
Canada 77 183420 32353885
Canada 78 177259 32531144
Canada 79 170901 32702045
Canada 80 159809 32861854
Canada 81 146075 33007929
Canada 82 137515 33145444
Canada 83 125378 33270822
Canada 84 116263 33387085
Canada 85 105468 33492553
Canada 86 94205 33586758
Canada 87 82396 33669154
Canada 88 73145 33742299
Canada 89 62417 33804716
Canada 90+ 200558 34005274
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Depending on how much precision you need, brute force can be fairly efficient:
data test / view=test;
set pop;
dumPop = round(pop/1000);
do i = 0 to dumPop-1;
dumAge = age + i / dumPop;
output;
end;
keep dumAge;
run;
proc univariate data=test;
var dumAge;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
But more seriously, you can do the math:
data medPop / view=medPop;
set pop end=done;
if done then do;
medPop = cumulative / 2;
output;
end;
keep medPop;
run;
data medAge;
set medPop;
do until(cumulative > medPop);
set pop;
end;
medAge = (age*(cumulative-medPop) + (age+1)*(medPop - cumulative + pop)) / pop;
output;
stop;
keep medAge;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Correct answer has already given the point.
data have;
input Geography $ Age _2010 Cumulative;
cards;
Canada 0 377861 377861
Canada 1 380538 758399
Canada 2 380634 1139033
Canada 3 371478 1510511
Canada 4 362882 1873393
Canada 5 356590 2229983
Canada 6 360976 2590959
Canada 7 355169 2946128
Canada 8 357366 3303494
Canada 9 361252 3664746
Canada 10 374141 4038887
Canada 11 376153 4415040
Canada 12 386812 4801852
Canada 13 400765 5202617
Canada 14 421855 5624472
Canada 15 434175 6058647
Canada 16 438635 6497282
Canada 17 446950 6944232
Canada 18 457629 7401861
Canada 19 472134 7873995
Canada 20 479938 8353933
Canada 21 465581 8819514
Canada 22 451825 9271339
Canada 23 457019 9728358
Canada 24 468109 10196467
Canada 25 472877 10669344
Canada 26 471323 11140667
Canada 27 470566 11611233
Canada 28 468454 12079687
Canada 29 474732 12554419
Canada 30 468396 13022815
Canada 31 457343 13480158
Canada 32 450990 13931148
Canada 33 452786 14383934
Canada 34 454337 14838271
Canada 35 456131 15294402
Canada 36 444761 15739163
Canada 37 447653 16186816
Canada 38 454974 16641790
Canada 39 479412 17121202
Canada 40 476941 17598143
Canada 41 473754 18071897
Canada 42 470960 18542857
Canada 43 476994 19019851
Canada 44 505091 19524942
Canada 45 543975 20068917
Canada 46 560528 20629445
Canada 47 565680 21195125
Canada 48 550871 21745996
Canada 49 559845 22305841
Canada 50 549137 22854978
Canada 51 535778 23390756
Canada 52 530407 23921163
Canada 53 518528 24439691
Canada 54 501147 24940838
Canada 55 496890 25437728
Canada 56 476058 25913786
Canada 57 452959 26366745
Canada 58 434140 26800885
Canada 59 426554 27227439
Canada 60 416427 27643866
Canada 61 406686 28050552
Canada 62 406030 28456582
Canada 63 409513 28866095
Canada 64 343036 29209131
Canada 65 319975 29529106
Canada 66 310757 29839863
Canada 67 297341 30137204
Canada 68 274022 30411226
Canada 69 262547 30673773
Canada 70 243273 30917046
Canada 71 233870 31150916
Canada 72 222439 31373355
Canada 73 209795 31583150
Canada 74 205522 31788672
Canada 75 196095 31984767
Canada 76 185698 32170465
Canada 77 183420 32353885
Canada 78 177259 32531144
Canada 79 170901 32702045
Canada 80 159809 32861854
Canada 81 146075 33007929
Canada 82 137515 33145444
Canada 83 125378 33270822
Canada 84 116263 33387085
Canada 85 105468 33492553
Canada 86 94205 33586758
Canada 87 82396 33669154
Canada 88 73145 33742299
Canada 89 62417 33804716
Canada 90 200558 34005274
;
run;
data temp;
set have;
do i=0.1 to 1 by 0.1;
new_age=age+i;
output;
end;
run;
proc means data=temp median;
var new_age;
freq _2010;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Reeza,
One note on the Median. Statistically, it is not well defined when there are ties. From the formal definition, any value between 39 and 40 is "correct". The reference that you cite assumes that the distribution of the values between 39 and 40 is uniform. That may be a reasonable assumption here, but less so elsewhere.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks @Doc_Duke for the information.
I think the uniformity assumption for birthdates is probably fair.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't think so . More like US has more September Baby.
Check Rick's blog :
http://blogs.sas.com/content/iml/2011/09/09/the-most-likely-birthday-in-the-us.html
http://blogs.sas.com/content/iml/2011/09/16/the-effect-of-holidays-on-us-births.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
And some of the issues around quantiles are why Procs Means, Summary and Tabulate have the QMETHOD and QNTLDEF options to clarify the calculation methods and results a bit.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Sometimes its easy to accept the way things are and not question them 😉