04-01-2016 10:14 AM
I have a data set containing national test scores with a variable representing an individuals increase in score on a test after taking a particular class. (Score_Increase=After_Class - Before_Class). I want to compute the weighted average score increase for each state in the dataset. Am I able to weight the data based on how many students took the test in each state. In other words do a freq procedure on the state and weight each observation within that state by the respective count produced in the freq procedure? I was thinking something like this..
proc means data=Test_Scores mean;
Thank you for your time!
04-01-2016 10:42 AM
If you have individual scores and then weight the values by the state count, is that actually weighted?
If you had state values and then weighted that by the count in calculating a national average that would make sense, but if you're reporting at a state level I'm not sure it computes.
04-01-2016 11:11 AM
I guess what I am asking is do I need to account for the difference in the amount of students taking the exam by state in oreder to say a state had the highest score increase due to the exam
04-01-2016 11:21 AM - edited 04-01-2016 11:24 AM
With the type of ranking that I think you may be attempting, you might want to consider looking at confidence intervals (a quicky eyeball test, not a format statistical test) before deciding if the change is the "highest". For instance suppose Stata A had a (mean) change of 6 and with the confidence interval for their students of (5.9, 6.1) and State B has a change of 5.9 but that confidence interval is (5.6, 6.2).
I would be very hesitant to claim that the actual change of State A is actually greater than State B with out further more complete tests.
One of the things I get to look at periodically are ranking tables and get varying attitudes from managers when telling them their rank could be any where from 1st to 23rd of 50 (or 27th to 50th in other cases).
Though many advocacy groups do this all the time, look at the point estimate and ignore variability in the result.
Another thing to think of would be does the actual change have any practical significance?
04-01-2016 11:25 AM
How are you going to say one is higher than another?
Proc means only produces the values, you should be calculating some type of confidence interval as well as a statistical test. Yes, you should be considering the size, which is what the confidence intervals and tests will factor in the N. See the link below for a more detailed answer.
I would also suggest you produce box plots by states, ordering them from high to low to show the differences. It's a good way to visually display the information and the distribution. You can also do a plot, where you plot the means and the confidence intervals, by state from high to low.
Hope that helps.