Hello All, I am wondering whether it is possible to calculate weighted percentiles using proc means.
When I use this syntax, it seems only mean is weighed.
proc means data= sodiumdata2009 min max mean median p10 q1 q3 noprint;
by Level_1;
var sodp100g;
weight RKVol;
run;
Any suggestions?
Thank you!
The percentiles are affected as far as I can tell.
Perhaps your data isn't affected by the weights for some reason?
If the weights are small?
data check;
set sashelp.class;
x=floor(rand('uniform')*8)+1;
run;
proc means data=check n p5 p10 p90 p95;
title 'no weights';
var weight;
run;
proc means data=check n p5 p10 p90 p95;
title 'weights';
var weight;
weight x;
run;
Why not use proc univariate? But I'm not quite sure what you are looking for because you included a noprint option. Are you trying to output this to another dataset, or look at it in the Results viewer, or something else entirely?
In my code I have an output statement that I deleted when posted the code. Sorry for the confusion. of course I need to see the results. 🙂
The percentiles are affected as far as I can tell.
Perhaps your data isn't affected by the weights for some reason?
If the weights are small?
data check;
set sashelp.class;
x=floor(rand('uniform')*8)+1;
run;
proc means data=check n p5 p10 p90 p95;
title 'no weights';
var weight;
run;
proc means data=check n p5 p10 p90 p95;
title 'weights';
var weight;
weight x;
run;
I compared weighted and unweighted stats and saw that only mean is affected. The other stats such as min to max did not change, so I thought maybe I am missing some option that would allow to obtain weighed min---max stats.
Hi Reeza,
When I ran your code I saw the difference in only one value, so I went back to my data and did more checking. Same as in your example I now see the difference in a very few values, so I guess the weighting is working. I kind of expected to see all weighted values to be different from unweighted one. Problem solved! Thank you All for contribution!
By any chance is your RKVol variable a COUNT type variable? If so then you may want to use FREQ instead of Weight.
That will in effect replicate each value of your variable RKVol times.
It is not an integer
So how about:
proc univariate data=sodiumdata2009 noprint;
by Level_1
var sodp100g;
weight RKVol;
output out=want min=min max=max mean=mean p10=p10 q1=q1 q3=q3;
run;
I have the same looking code:
proc means data=data2009.sodiumdata2009 min max mean median p10 q1 q3 noprint;
by Level_1;
var sodp100g;
weight RKVol;
output out=Weighted_2009data mean=mean_per100gr_sodium_mg2009 min=Min_per100gr_sodium_mg2009 median=median_per100gr_sodium_mg2009 p10=_10th_2009 q1=_25th_2009 q3=_75th_2009 max=Max_per100gr_sodium_mg2009;
run;
So it doesn't look like the weight has an effect on the min and max, but that it does on everyother value.
proc univariate data=sashelp.bweight noprint;
var Weight;
weight MomWtGain;
output out=goofy1 min=min max=max mean=mean p10=p10 q1=q1 q3=q3;
run;
proc univariate data=sashelp.bweight noprint;
var Weight;
output out=goofy2 min=min max=max mean=mean p10=p10 q1=q1 q3=q3;
run;
proc means data=sashelp.bweight noprint;
var Weight;
weight MomWtGain;
output out=goofy3 min=min max=max mean=mean p10=p10 q1=q1 q3=q3;
run;
proc univariate data=sashelp.bweight noprint;
var Weight;
output out=goofy4 min=min max=max mean=mean p10=p10 q1=q1 q3=q3;
run;
proc sql;
create table goofy as
select distinct min, max, mean, p10, q1, q3, "Univariate no weight" as type from goofy2
union corr
select distinct min, max, mean, p10, q1, q3, "Univariate weighted" as type from goofy1
union corr
select distinct min, max, mean, p10, q1, q3, "Means no weight" as type from goofy4
union corr
select distinct min, max, mean, p10, q1, q3, "Means weighted" as type from goofy3;
However thats because a weight statement only makes that observation more valuable to the statistic, it doesn't actually change the real value of the variable. Hence the reason that the min and max never change, but the computed statistics do change. Oh and there is no difference between proc means and proc univariate outputs, thats what the above shows.
thank you!
Percentiles generally will not be effected by a weight value, though would by a freq, since they are ORDER statistics. The smallest stays the smallest, the largest the largest and the sort order of the variable does not change.
I wanted to look a bit farther into this, and this is what I found. https://communities.sas.com/t5/SAS-Procedures/Weighted-v-Unweighted-data-in-proc-means-and-proc-univ... so maybe don't use the standard deviation, but all of the other statistics that you have shouldn't really be affected too much.
Oh and apparently include QMETHOD=OS in the proc statement.
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000146736.htm
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.