BookmarkSubscribeRSS Feed
Obsidian | Level 7

Hi All,

    I have a data set containing national test scores with a variable representing an individuals increase in score on a test after taking a particular class. (Score_Increase=After_Class - Before_Class). I want to compute the weighted average score increase for each state in the dataset. Am I able to weight the data based on how many students took the test in each state. In other words do a freq procedure on the state and weight each observation within that state by the respective count produced in the freq procedure? I was thinking something like this..


proc means data=Test_Scores mean;

class State/descending;

vars score_increase/wieght=freq;



Thank you for your time!

Super User


If you have individual scores and then weight the values by the state count, is that actually weighted? 


If you had state values and then weighted that by the count in calculating a national average that would make sense, but if you're reporting at a state level I'm not sure it computes.


Obsidian | Level 7

I guess what I am asking is do I need to account for the difference in the amount of students taking the exam by state in oreder to say a state had the highest score increase due to the exam

Super User

With the type of ranking that I think you may be attempting, you might want to consider looking at confidence intervals (a quicky eyeball test, not a format statistical test) before deciding if the change is the "highest". For instance suppose Stata A had a (mean) change of 6 and with the confidence interval for their students of (5.9, 6.1) and State B has a change of 5.9 but that confidence interval is (5.6, 6.2).

I would be very hesitant to claim that the actual change of State A is actually greater than State B with out further more complete tests.


One of the things I get to look at periodically are ranking tables and get varying attitudes from managers when telling them their rank could be any where from 1st to 23rd of 50 (or 27th to 50th in other cases).


Though many advocacy groups do this all the time, look at the point estimate and ignore variability in the result.


Another thing to think of would be does the actual change have any practical significance?

Super User

How are you going to say one is higher than another?


Proc means only produces the values, you should be calculating some type of confidence interval as well as a statistical test. Yes, you should be considering the size, which is what the confidence intervals and tests will factor in the N. See the link below for a more detailed answer.


I would also suggest you produce box plots by states, ordering them from high to low to show the differences. It's a good way to visually display the information and the distribution. You can also do a plot, where you plot the means and the confidence intervals, by state from high to low. 


Hope that helps.




Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 3 in conversation