BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
luch25
Obsidian | Level 7

I have two different data sets that I need to use to calculate a weighted average:

 

1. Data set with population count for each state

data state;
input state pop_count;
datalines;
1 200
2 300
3 400
4 500
6 200
7 300
8 500
;

2. Dataset unweighted average for each company and states that they are located in. Some companies are located in multiple states, which is indicated by the colX variables.

data company;
input name $ mean col1 col2 col3 col4;
datalines;
A 0.6 1 . . .
B 0.8 2 3 . .
C 0.7 4 . . .
D 0.5 5 6 7 8
;

 

The issue that I'm having is how to account for companies located in multiple states. How can I get the state-level total population count data from the state data set into the company data set?

The final weighted average should be 0.61 for the data listed above. 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @luch25 and welcome to the SAS Support Communities!

 

You can compute the weighted average with PROC MEANS, so you just need to create an input dataset containing the names, means and weights:

data want(keep=name mean popc);
array popcnt[50] _temporary_;
if _n_=1 then do until(lr);
  set state end=lr;
  popcnt[state]=pop_count;
end;
set company;
array col[*] col:;
do i=1 to dim(col) while(col[i]); 
  popc=sum(popc,popcnt[col[i]]);
end;
run;

proc means data=want mean;
weight popc;
var mean;
run;

The result is 0.61 if the missing observation with state=5, pop_count=600 has been inserted into dataset STATE.

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Eight states, but only COL1-COL4? I don't understand.


Can you show us the mathematical calculation steps you follow to get the answer 0.61?

--
Paige Miller
luch25
Obsidian | Level 7
Yes, because some companies are located in multiple states (example: company D is located in states 5, 6, 7, 8).
I calculated the weighted average in Excel, but the steps are basically:
1. Calculate the total population across all states (3000)
2. Calculate the state population for each company (e.g., company D has 4 states, with a total population of 1600)
3. Multiply the company's unweighted average by the: (value from step 2 / value from step 1). Do this for each company, and then sum across all companies for the weighted average.
FreelanceReinh
Jade | Level 19

Hi @luch25 and welcome to the SAS Support Communities!

 

You can compute the weighted average with PROC MEANS, so you just need to create an input dataset containing the names, means and weights:

data want(keep=name mean popc);
array popcnt[50] _temporary_;
if _n_=1 then do until(lr);
  set state end=lr;
  popcnt[state]=pop_count;
end;
set company;
array col[*] col:;
do i=1 to dim(col) while(col[i]); 
  popc=sum(popc,popcnt[col[i]]);
end;
run;

proc means data=want mean;
weight popc;
var mean;
run;

The result is 0.61 if the missing observation with state=5, pop_count=600 has been inserted into dataset STATE.

luch25
Obsidian | Level 7

@FreelanceReinh this is exactly what I needed, thank you! 

Question- what does the "array popcnt[50]" do? Is this basically creating a rule for theoretical number of 50 states? So if I only had 8 states, could this be changed to popcnt[8]? That seemed to be the case when I played around with your code, but want to make sure.

FreelanceReinh
Jade | Level 19

Exactly. The array POPCNT just needs to be large enough so that its index range (here: 1 through 50) includes all values of variable STATE (most importantly the non-missing values of COL1-COL4), so in your example popcnt[8] would be sufficient. (Other possible input datasets may contain non-consecutive state codes like 7, 13, 29, etc. so that not only the number of distinct codes is relevant.) Since it's a temporary array, the difference between 50 and 8 doesn't really matter.

luch25
Obsidian | Level 7
Got it, thank you for the explanation and help with the code! Have a great day!
PaigeMiller
Diamond | Level 26

Maybe you can use the solution from @FreelanceReinh, although I think simpler code might be available. Nevertheless, allow me to make a few comments in red

 


@luch25 wrote:
Yes, because some companies are located in multiple states (example: company D is located in states 5, 6, 7, 8).
I calculated the weighted average in Excel, but the steps are basically:
1. Calculate the total population across all states (3000) no its not, it adds up to 2400
2. Calculate the state population for each company (e.g., company D has 4 states, with a total population of 1600) there is no state 5 in the first data set
3. Multiply the company's unweighted average by the: (value from step 2 / value from step 1). Do this for each company, and then sum across all companies for the weighted average. Step by step explanation, please ...

 

--
Paige Miller

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2269 views
  • 2 likes
  • 3 in conversation