I have two different data sets that I need to use to calculate a weighted average:
1. Data set with population count for each state
data state;
input state pop_count;
datalines;
1 200
2 300
3 400
4 500
6 200
7 300
8 500
;
2. Dataset unweighted average for each company and states that they are located in. Some companies are located in multiple states, which is indicated by the colX variables.
data company;
input name $ mean col1 col2 col3 col4;
datalines;
A 0.6 1 . . .
B 0.8 2 3 . .
C 0.7 4 . . .
D 0.5 5 6 7 8
;
The issue that I'm having is how to account for companies located in multiple states. How can I get the state-level total population count data from the state data set into the company data set?
The final weighted average should be 0.61 for the data listed above.
Hi @luch25 and welcome to the SAS Support Communities!
You can compute the weighted average with PROC MEANS, so you just need to create an input dataset containing the names, means and weights:
data want(keep=name mean popc);
array popcnt[50] _temporary_;
if _n_=1 then do until(lr);
set state end=lr;
popcnt[state]=pop_count;
end;
set company;
array col[*] col:;
do i=1 to dim(col) while(col[i]);
popc=sum(popc,popcnt[col[i]]);
end;
run;
proc means data=want mean;
weight popc;
var mean;
run;
The result is 0.61 if the missing observation with state=5, pop_count=600 has been inserted into dataset STATE.
Eight states, but only COL1-COL4? I don't understand.
Can you show us the mathematical calculation steps you follow to get the answer 0.61?
Hi @luch25 and welcome to the SAS Support Communities!
You can compute the weighted average with PROC MEANS, so you just need to create an input dataset containing the names, means and weights:
data want(keep=name mean popc);
array popcnt[50] _temporary_;
if _n_=1 then do until(lr);
set state end=lr;
popcnt[state]=pop_count;
end;
set company;
array col[*] col:;
do i=1 to dim(col) while(col[i]);
popc=sum(popc,popcnt[col[i]]);
end;
run;
proc means data=want mean;
weight popc;
var mean;
run;
The result is 0.61 if the missing observation with state=5, pop_count=600 has been inserted into dataset STATE.
@FreelanceReinh this is exactly what I needed, thank you!
Question- what does the "array popcnt[50]" do? Is this basically creating a rule for theoretical number of 50 states? So if I only had 8 states, could this be changed to popcnt[8]? That seemed to be the case when I played around with your code, but want to make sure.
Exactly. The array POPCNT just needs to be large enough so that its index range (here: 1 through 50) includes all values of variable STATE (most importantly the non-missing values of COL1-COL4), so in your example popcnt[8] would be sufficient. (Other possible input datasets may contain non-consecutive state codes like 7, 13, 29, etc. so that not only the number of distinct codes is relevant.) Since it's a temporary array, the difference between 50 and 8 doesn't really matter.
Maybe you can use the solution from @FreelanceReinh, although I think simpler code might be available. Nevertheless, allow me to make a few comments in red
@luch25 wrote:
Yes, because some companies are located in multiple states (example: company D is located in states 5, 6, 7, 8).
I calculated the weighted average in Excel, but the steps are basically:
1. Calculate the total population across all states (3000) no its not, it adds up to 2400
2. Calculate the state population for each company (e.g., company D has 4 states, with a total population of 1600) there is no state 5 in the first data set
3. Multiply the company's unweighted average by the: (value from step 2 / value from step 1). Do this for each company, and then sum across all companies for the weighted average. Step by step explanation, please ...
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.