Below is a simplified example of the data I have. There are three seasons, multiple regions within each season, multiple counties within each region. For our project we may select 1 to 10 of these locations to visit for interviews. We need to calculate the number of interviews to conduct in each selected location based on the total population. So we'll need to calculate the percentage. For example, in Fall, West, we decided to go to 3 locations (3 selected). Cumulative population would be 40. The percentage for each location would be like this: 10 / 40 = .25 10 / 40 = .25 20 / 40 = .50 In the Fall, East, we decided to go to only 2 locations (2 selected). Cumulative population would be 100. The percentage for each location would be like this: 50/ 100 = .50 50 / 100 = .50 season region county selected population cumulative_population Fall West A 1 10 10 Fall West B 2 10 20 Fall West C 3 20 40 Fall West D 4 20 60 Fall East W 1 50 50 Fall East X 2 50 100 Fall East Y 3 60 160 Fall East Z 4 60 220 Spring West A 1 30 30 Spring West B 2 30 60 Spring West C 3 40 100 Spring West D 4 40 140 Spring East W 1 80 80 Spring East X 2 80 160 Spring East Y 3 90 250 Spring East Z 4 90 340 Summer West A 1 70 70 Summer West B 2 70 140 Summer West C 3 80 220 Summer West D 4 80 300 Summer East W 1 60 60 Summer East X 2 60 120 Summer East Y 3 70 190 Summer East Z 4 70 260 I'm thinking I need to have separate columns/variables that contain the cumulative population for each selected. For example, Cumulative_Pop_1selected, Cumulative_Pop_2selected, Cumulative_Pop_3selected, Cumulative_Pop_4selected, for Fall region West would contain the values 10, 20, 40, and 60, respectively. And so on for each combination of season and region. Like this: season region county selected population cumulative_population Cumulative_Pop_1selected Cumulative_Pop_2selected Cumulative_Pop_3selected Cumulative_Pop_4selected Fall West A 1 10 10 10 20 40 60 Fall West B 2 10 20 10 20 40 60 Fall West C 3 20 40 10 20 40 60 Fall West D 4 20 60 10 20 40 60 Fall East W 1 50 50 50 100 160 220 Fall East X 2 50 100 50 100 160 220 Fall East Y 3 60 160 50 100 160 220 Fall East Z 4 60 220 50 100 160 220 Spring West A 1 30 30 30 60 100 140 Spring West B 2 30 60 30 60 100 140 Spring West C 3 40 100 30 60 100 140 Spring West D 4 40 140 30 60 100 140 Spring East W 1 80 80 80 160 250 340 Spring East X 2 80 160 80 160 250 340 Spring East Y 3 90 250 80 160 250 340 Spring East Z 4 90 340 80 160 250 340 Summer West A 1 70 70 70 140 220 300 Summer West B 2 70 140 70 140 220 300 Summer West C 3 80 220 70 140 220 300 Summer West D 4 80 300 70 140 220 300 Summer East W 1 60 60 60 120 190 260 Summer East X 2 60 120 60 120 190 260 Summer East Y 3 70 190 60 120 190 260 Summer East Z 4 70 260 60 120 190 260 I was able to achieve this using a series of Proc Summary to create Cumulative_Pop_1selected, Cumulative_Pop_2selected, etc. PROC SUMMARY DATA=all; BY season region; where count = 2; VAR cumulative_population; OUTPUT OUT = 2selected (DROP=_TYPE_ _FREQ_) SUM=Cumulative_Pop_2selected; RUN; PROC SUMMARY DATA=all; BY season region; where count = 3; VAR cumulative_population; OUTPUT OUT = 3selected (DROP=_TYPE_ _FREQ_) SUM=Cumulative_Pop_3selected; RUN; etc. But is there an easier way to get the table above that contain Cumulative_Pop_1selected, Cumulative_Pop_2selected, etc.? With my method, there will be 10 sets of Proc Summary codes. Finally, I can merge the Proc Summary data with the original data to get the table above, and calculate the percentage using: if count <= 2 then percent_2selected = population / Cumulative_Pop_2selected; if count <= 3 then percent_3selected = population / Cumulative_Pop_3selected; etc. I hope this is clear and not too complicated. Thanks in advance!
... View more