BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michokwu
Quartz | Level 8

Hello Team,

I have a weighted count from a sample, but it returns a different report when I run the proc means operation on the same sample. 'Sub' is a binary-valued variable (1,0) and has no missing values but values that are less than the weighted count. Could someone please clarify this to me? Thank you!

/*weighted count*/
proc sql;
create table sample as
SELECT Year,group, Sum(weight) AS freq
FROM biz.workers(where=(age between 25 and 64 and year>=2014))
GROUP BY year,group;
run;

/*Statistics for Sub*/
proc means data=biz.workers(where=(age between 25 and 64 and year>=2014)) n mean stderr lclm uclm alpha=0.05 vardef=df Maxdec=4;
class year group;
var sub /weight=weight;
run;
WEIGHTED COUNT OF POPULATION
YEARABCTotal
2015         798,535             62,753                 76,191                937,479
2016         802,110             66,391                 81,967                950,468
2017         824,222             60,694                 90,189                975,105
2018         829,473             74,191                 94,218                997,882
2019         838,824             77,843                 99,673             1,016,340

 

Analysis Variable : Sub
Census yearGroupN ObsNMeanStd ErrorLower 95%
CL for Mean
Upper 95%
CL for Mean
2015         401,948             401,9480.0010.0000.0010.001
2015           35,036               35,0360.0000.0000.0000.001
2015C         101,315             101,3150.0000.0000.0000.001
2016A         409,568             409,5680.0010.0000.0010.001
2016B           35,755               35,7550.0000.0000.0000.001
2016C         106,454             106,4540.0000.0000.0000.001
2017A         419,671             419,6710.0010.0000.0010.001
2017B           35,641               35,6410.0000.0000.0000.001
2017C         114,356             114,3560.0000.0000.0000.001
2018A         425,452             425,4520.0010.0000.0010.001
2018B           36,094               36,0940.0010.0000.0010.001
2018C         118,881             118,8810.0000.0000.0000.001
2019A         437,484             437,4840.0010.0000.0010.001
2019B           36,043               36,0430.0000.0000.0000.000
2019C         124,615             124,6150.0000.0000.0000.000
1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Also, in this case I get the same answers from both PROC SQL and PROC MEANS.

 

proc means data=sashelp.class sumwgt;
	var height/weight=weight;
	class sex age;
run;

proc sql;
	select sex,age,sum(weight) as sumweight from sashelp.class
		group by sex,age;
quit;
--
Paige Miller

View solution in original post

9 REPLIES 9
PGStats
Opal | Level 21

Weighted count?

 

Do not confuse weights (provided with the WEIGHT statement) and frequencies (the FREQ statement). They do not have the same effect on the computation of statistics. If w is the weigth associated with your observations then the (weighted) sum of variable x is SUM(w*x)/SUM(w). If f is the frequency of your observations then the sum of x is SUM(f*x).

PG
PaigeMiller
Diamond | Level 26

I'm  pretty sure you need to request the SUMWGT output statistic from PROC MEANS to match your  PROC SQL results.

--
Paige Miller
michokwu
Quartz | Level 8

The results are still different

 

Analysis Variable : Sub
Census yearGroupN ObsNMeanSum WgtsStd ErrorLower 95%Upper 95%
2014 396,550 396,5500.0007   37,993,14500.00060.0007
2014   34,367   34,3670.0007     4,270,9960.00010.00040.001
2014   96,261   96,2610.0004   10,540,7370.00010.00020.0005
2015 401,948 401,9480.0006   38,463,22500.00050.0007
2015   35,036   35,0360.0005     4,420,2170.00010.00030.0007
2015 101,315 101,3150.0005   11,192,9420.00010.00030.0006
2016 409,568 409,5680.0007   39,102,72000.00060.0008
2016   35,755   35,7550.0004     4,580,6500.00010.00020.0006
2016 106,454 106,4540.0004   11,651,1410.00010.00030.0005
2017 419,671 419,6710.0006   39,609,12300.00050.0007
2017   35,641   35,6410.0004     4,789,3020.00010.00020.0006
2017 114,356 114,3560.0004   12,431,2530.00010.00030.0005
2018 425,452 425,4520.0006   39,951,64900.00060.0007
2018   36,094   36,0940.0009     4,969,0160.00020.00060.0012
2018 118,881 118,8810.0005   13,130,0470.00010.00040.0006
2019 437,484 437,4840.0006   40,368,02700.00050.0006
2019   36,043   36,0430.0003     5,170,2700.00010.00010.0004
2019 124,615 124,6150.0004   13,633,7500.00010.00020.0005
PaigeMiller
Diamond | Level 26

Okay, its time for you to provide a portion of your data, as SAS data step code (instructions). Do not provide data as a screen capture or in an attachment.

 

Also, please show the EXACT code that produced this PROC MEANS output.

--
Paige Miller
PaigeMiller
Diamond | Level 26

Also, in this case I get the same answers from both PROC SQL and PROC MEANS.

 

proc means data=sashelp.class sumwgt;
	var height/weight=weight;
	class sex age;
run;

proc sql;
	select sex,age,sum(weight) as sumweight from sashelp.class
		group by sex,age;
quit;
--
Paige Miller
Tom
Super User Tom
Super User

I don't understand what you are trying to do.

If you want to reproduce the numbers you get with the SQL code then ask for the SUMWGT statistic.

https://blogs.sas.com/content/iml/2016/01/06/weighted-mean-in-sas.html

 

How many of your weight values are non-positive?

 

Reeza
Super User

The SQL code does not correctly account for the weight statement. 

To do that you need to multiply by the weights before you sum the data. 

 

Or you need to remove the WEIGHT from the PROC MEANS option. So do you want a weighted sum or an unweighted sum?

 

Assuming you do need weights here's an example of how you can make them match regardless of calculation method:

proc means data=sashelp.class sum mean sumwgt nway;
   class sex age;
	var height / weight=weight;

run;

proc sql;
	select sex, age,
	sum(weight*height) as want1,
	sum(weight) as sumweight,
	calculated want1 / calculated sumweight as mean
	from sashelp.class
		group by sex,age;
quit;

FYI - if you're using weights and want to report standard deviation you may want to use SURVEYMEANS not MEANS. 

 

Spoiler

@michokwu wrote:

Hello Team,

I have a weighted count from a sample, but it returns a different report when I run the proc means operation on the same sample. 'Sub' is a binary-valued variable (1,0) and has no missing values but values that are less than the weighted count. Could someone please clarify this to me? Thank you!

/*weighted count*/
proc sql;
create table sample as
SELECT Year,group, Sum(weight) AS freq
FROM biz.workers(where=(age between 25 and 64 and year>=2014))
GROUP BY year,group;
run;

/*Statistics for Sub*/
proc means data=biz.workers(where=(age between 25 and 64 and year>=2014)) n mean stderr lclm uclm alpha=0.05 vardef=df Maxdec=4;
class year group;
var sub /weight=weight;
run;
WEIGHTED COUNT OF POPULATION
YEAR A B C Total
2015          798,535              62,753                  76,191                 937,479
2016          802,110              66,391                  81,967                 950,468
2017          824,222              60,694                  90,189                 975,105
2018          829,473              74,191                  94,218                 997,882
2019          838,824              77,843                  99,673              1,016,340

 

Analysis Variable : Sub
Census year Group N Obs N Mean Std Error Lower 95%
CL for Mean
Upper 95%
CL for Mean
2015          401,948              401,948 0.001 0.000 0.001 0.001
2015            35,036                35,036 0.000 0.000 0.000 0.001
2015 C          101,315              101,315 0.000 0.000 0.000 0.001
2016 A          409,568              409,568 0.001 0.000 0.001 0.001
2016 B            35,755                35,755 0.000 0.000 0.000 0.001
2016 C          106,454              106,454 0.000 0.000 0.000 0.001
2017 A          419,671              419,671 0.001 0.000 0.001 0.001
2017 B            35,641                35,641 0.000 0.000 0.000 0.001
2017 C          114,356              114,356 0.000 0.000 0.000 0.001
2018 A          425,452              425,452 0.001 0.000 0.001 0.001
2018 B            36,094                36,094 0.001 0.000 0.001 0.001
2018 C          118,881              118,881 0.000 0.000 0.000 0.001
2019 A          437,484              437,484 0.001 0.000 0.001 0.001
2019 B            36,043                36,043 0.000 0.000 0.000 0.000
2019 C          124,615              124,615 0.000 0.000 0.000 0.000

PaigeMiller
Diamond | Level 26

@Reeza wrote:

The SQL code does not correctly account for the weight statement. 

To do that you need to multiply by the weights before you sum the data. 

 

Or you need to remove the WEIGHT from the PROC MEANS option. So do you want a weighted sum or an unweighted sum?

 

Assuming you do need weights here's an example of how you can make them match regardless of calculation method:

proc means data=sashelp.class sum mean sumwgt nway;
   class sex age;
	var height / weight=weight;

run;

proc sql;
	select sex, age,
	sum(weight*height) as want1,
	sum(weight) as sumweight,
	calculated want1 / calculated sumweight as mean
	from sashelp.class
		group by sex,age;
quit;

FYI - if you're using weights and want to report standard deviation you may want to use SURVEYMEANS not MEANS.


This SQL method of calculating a weighted mean does not work correctly in the case where the variable of interest (in this case HEIGHT) has missing values — which could be another reason why the OP finds PROC SQL and PROC MEANS does not match.

 

I don't have a problem with weighted standard deviations from PROC MEANS. I guess it depends, either MEAN or SURVEYMEANS works in the right situation, and should be avoided in other situations.

--
Paige Miller
michokwu
Quartz | Level 8

Thank you @Reeza and @PaigeMiller , both of your procedures were correct

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1659 views
  • 12 likes
  • 5 in conversation