Hello Team,
I have a weighted count from a sample, but it returns a different report when I run the proc means operation on the same sample. 'Sub' is a binary-valued variable (1,0) and has no missing values but values that are less than the weighted count. Could someone please clarify this to me? Thank you!
/*weighted count*/
proc sql;
create table sample as
SELECT Year,group, Sum(weight) AS freq
FROM biz.workers(where=(age between 25 and 64 and year>=2014))
GROUP BY year,group;
run;
/*Statistics for Sub*/
proc means data=biz.workers(where=(age between 25 and 64 and year>=2014)) n mean stderr lclm uclm alpha=0.05 vardef=df Maxdec=4;
class year group;
var sub /weight=weight;
run;
WEIGHTED COUNT OF POPULATION | ||||
YEAR | A | B | C | Total |
2015 | 798,535 | 62,753 | 76,191 | 937,479 |
2016 | 802,110 | 66,391 | 81,967 | 950,468 |
2017 | 824,222 | 60,694 | 90,189 | 975,105 |
2018 | 829,473 | 74,191 | 94,218 | 997,882 |
2019 | 838,824 | 77,843 | 99,673 | 1,016,340 |
Analysis Variable : Sub | |||||||
Census year | Group | N Obs | N | Mean | Std Error | Lower 95% CL for Mean | Upper 95% CL for Mean |
2015 | A | 401,948 | 401,948 | 0.001 | 0.000 | 0.001 | 0.001 |
2015 | B | 35,036 | 35,036 | 0.000 | 0.000 | 0.000 | 0.001 |
2015 | C | 101,315 | 101,315 | 0.000 | 0.000 | 0.000 | 0.001 |
2016 | A | 409,568 | 409,568 | 0.001 | 0.000 | 0.001 | 0.001 |
2016 | B | 35,755 | 35,755 | 0.000 | 0.000 | 0.000 | 0.001 |
2016 | C | 106,454 | 106,454 | 0.000 | 0.000 | 0.000 | 0.001 |
2017 | A | 419,671 | 419,671 | 0.001 | 0.000 | 0.001 | 0.001 |
2017 | B | 35,641 | 35,641 | 0.000 | 0.000 | 0.000 | 0.001 |
2017 | C | 114,356 | 114,356 | 0.000 | 0.000 | 0.000 | 0.001 |
2018 | A | 425,452 | 425,452 | 0.001 | 0.000 | 0.001 | 0.001 |
2018 | B | 36,094 | 36,094 | 0.001 | 0.000 | 0.001 | 0.001 |
2018 | C | 118,881 | 118,881 | 0.000 | 0.000 | 0.000 | 0.001 |
2019 | A | 437,484 | 437,484 | 0.001 | 0.000 | 0.001 | 0.001 |
2019 | B | 36,043 | 36,043 | 0.000 | 0.000 | 0.000 | 0.000 |
2019 | C | 124,615 | 124,615 | 0.000 | 0.000 | 0.000 | 0.000 |
Also, in this case I get the same answers from both PROC SQL and PROC MEANS.
proc means data=sashelp.class sumwgt;
var height/weight=weight;
class sex age;
run;
proc sql;
select sex,age,sum(weight) as sumweight from sashelp.class
group by sex,age;
quit;
Weighted count?
Do not confuse weights (provided with the WEIGHT statement) and frequencies (the FREQ statement). They do not have the same effect on the computation of statistics. If w is the weigth associated with your observations then the (weighted) sum of variable x is SUM(w*x)/SUM(w). If f is the frequency of your observations then the sum of x is SUM(f*x).
I'm pretty sure you need to request the SUMWGT output statistic from PROC MEANS to match your PROC SQL results.
The results are still different
Analysis Variable : Sub | ||||||||
Census year | Group | N Obs | N | Mean | Sum Wgts | Std Error | Lower 95% | Upper 95% |
2014 | A | 396,550 | 396,550 | 0.0007 | 37,993,145 | 0 | 0.0006 | 0.0007 |
2014 | B | 34,367 | 34,367 | 0.0007 | 4,270,996 | 0.0001 | 0.0004 | 0.001 |
2014 | C | 96,261 | 96,261 | 0.0004 | 10,540,737 | 0.0001 | 0.0002 | 0.0005 |
2015 | A | 401,948 | 401,948 | 0.0006 | 38,463,225 | 0 | 0.0005 | 0.0007 |
2015 | B | 35,036 | 35,036 | 0.0005 | 4,420,217 | 0.0001 | 0.0003 | 0.0007 |
2015 | C | 101,315 | 101,315 | 0.0005 | 11,192,942 | 0.0001 | 0.0003 | 0.0006 |
2016 | A | 409,568 | 409,568 | 0.0007 | 39,102,720 | 0 | 0.0006 | 0.0008 |
2016 | B | 35,755 | 35,755 | 0.0004 | 4,580,650 | 0.0001 | 0.0002 | 0.0006 |
2016 | C | 106,454 | 106,454 | 0.0004 | 11,651,141 | 0.0001 | 0.0003 | 0.0005 |
2017 | A | 419,671 | 419,671 | 0.0006 | 39,609,123 | 0 | 0.0005 | 0.0007 |
2017 | B | 35,641 | 35,641 | 0.0004 | 4,789,302 | 0.0001 | 0.0002 | 0.0006 |
2017 | C | 114,356 | 114,356 | 0.0004 | 12,431,253 | 0.0001 | 0.0003 | 0.0005 |
2018 | A | 425,452 | 425,452 | 0.0006 | 39,951,649 | 0 | 0.0006 | 0.0007 |
2018 | B | 36,094 | 36,094 | 0.0009 | 4,969,016 | 0.0002 | 0.0006 | 0.0012 |
2018 | C | 118,881 | 118,881 | 0.0005 | 13,130,047 | 0.0001 | 0.0004 | 0.0006 |
2019 | A | 437,484 | 437,484 | 0.0006 | 40,368,027 | 0 | 0.0005 | 0.0006 |
2019 | B | 36,043 | 36,043 | 0.0003 | 5,170,270 | 0.0001 | 0.0001 | 0.0004 |
2019 | C | 124,615 | 124,615 | 0.0004 | 13,633,750 | 0.0001 | 0.0002 | 0.0005 |
Okay, its time for you to provide a portion of your data, as SAS data step code (instructions). Do not provide data as a screen capture or in an attachment.
Also, please show the EXACT code that produced this PROC MEANS output.
Also, in this case I get the same answers from both PROC SQL and PROC MEANS.
proc means data=sashelp.class sumwgt;
var height/weight=weight;
class sex age;
run;
proc sql;
select sex,age,sum(weight) as sumweight from sashelp.class
group by sex,age;
quit;
I don't understand what you are trying to do.
If you want to reproduce the numbers you get with the SQL code then ask for the SUMWGT statistic.
https://blogs.sas.com/content/iml/2016/01/06/weighted-mean-in-sas.html
How many of your weight values are non-positive?
The SQL code does not correctly account for the weight statement.
To do that you need to multiply by the weights before you sum the data.
Or you need to remove the WEIGHT from the PROC MEANS option. So do you want a weighted sum or an unweighted sum?
Assuming you do need weights here's an example of how you can make them match regardless of calculation method:
proc means data=sashelp.class sum mean sumwgt nway;
class sex age;
var height / weight=weight;
run;
proc sql;
select sex, age,
sum(weight*height) as want1,
sum(weight) as sumweight,
calculated want1 / calculated sumweight as mean
from sashelp.class
group by sex,age;
quit;
FYI - if you're using weights and want to report standard deviation you may want to use SURVEYMEANS not MEANS.
@michokwu wrote:
Hello Team,
I have a weighted count from a sample, but it returns a different report when I run the proc means operation on the same sample. 'Sub' is a binary-valued variable (1,0) and has no missing values but values that are less than the weighted count. Could someone please clarify this to me? Thank you!
/*weighted count*/ proc sql; create table sample as SELECT Year,group, Sum(weight) AS freq FROM biz.workers(where=(age between 25 and 64 and year>=2014)) GROUP BY year,group; run; /*Statistics for Sub*/ proc means data=biz.workers(where=(age between 25 and 64 and year>=2014)) n mean stderr lclm uclm alpha=0.05 vardef=df Maxdec=4; class year group; var sub /weight=weight; run;
WEIGHTED COUNT OF POPULATION YEAR A B C Total 2015 798,535 62,753 76,191 937,479 2016 802,110 66,391 81,967 950,468 2017 824,222 60,694 90,189 975,105 2018 829,473 74,191 94,218 997,882 2019 838,824 77,843 99,673 1,016,340
Analysis Variable : Sub Census year Group N Obs N Mean Std Error Lower 95%
CL for MeanUpper 95%
CL for Mean2015 A 401,948 401,948 0.001 0.000 0.001 0.001 2015 B 35,036 35,036 0.000 0.000 0.000 0.001 2015 C 101,315 101,315 0.000 0.000 0.000 0.001 2016 A 409,568 409,568 0.001 0.000 0.001 0.001 2016 B 35,755 35,755 0.000 0.000 0.000 0.001 2016 C 106,454 106,454 0.000 0.000 0.000 0.001 2017 A 419,671 419,671 0.001 0.000 0.001 0.001 2017 B 35,641 35,641 0.000 0.000 0.000 0.001 2017 C 114,356 114,356 0.000 0.000 0.000 0.001 2018 A 425,452 425,452 0.001 0.000 0.001 0.001 2018 B 36,094 36,094 0.001 0.000 0.001 0.001 2018 C 118,881 118,881 0.000 0.000 0.000 0.001 2019 A 437,484 437,484 0.001 0.000 0.001 0.001 2019 B 36,043 36,043 0.000 0.000 0.000 0.000 2019 C 124,615 124,615 0.000 0.000 0.000 0.000
@Reeza wrote:
The SQL code does not correctly account for the weight statement.
To do that you need to multiply by the weights before you sum the data.
Or you need to remove the WEIGHT from the PROC MEANS option. So do you want a weighted sum or an unweighted sum?
Assuming you do need weights here's an example of how you can make them match regardless of calculation method:
proc means data=sashelp.class sum mean sumwgt nway; class sex age; var height / weight=weight; run; proc sql; select sex, age, sum(weight*height) as want1, sum(weight) as sumweight, calculated want1 / calculated sumweight as mean from sashelp.class group by sex,age; quit;
FYI - if you're using weights and want to report standard deviation you may want to use SURVEYMEANS not MEANS.
This SQL method of calculating a weighted mean does not work correctly in the case where the variable of interest (in this case HEIGHT) has missing values — which could be another reason why the OP finds PROC SQL and PROC MEANS does not match.
I don't have a problem with weighted standard deviations from PROC MEANS. I guess it depends, either MEAN or SURVEYMEANS works in the right situation, and should be avoided in other situations.
Thank you @Reeza and @PaigeMiller , both of your procedures were correct
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.