I know PROC UNIVARIATE willgive me weighted percentiles, but how do I interpret the results? If the weighted median is 5.5, what does that say about my data? I guess I don't understand how the weights are effecting the stiatistics.
Here is an example:
data Have;
input x w;
datalines;
1 1
2 2
3 1
4 2
5 2
6 4
7 3
8 1
;
run;
proc univariate data=Have;
var x;
weight w;
ods select quantiles;
run;
When I run this code I get
Q3=6.5
Median=5.5
Q1=3.5
I don't get these values. Why is the median 5.5? And the other quartiles?
For an interpretation of weighted percentiles, see the article "Weighted percentiles."
The basic idea is to sort the data in increasing order (your data are already sorted). Then add up the cumulative weights and take the percentiles of the total weight.
For your data, the weights sum to 16. The 50th percentile is therefore the data value for which half the weight is on one side and half is on the other. If you run down your data, you see that any number between 5 and 6 has half the weight (8 units) on both sides.
The other percentiles are similar. The 25th percentile is the data value for which 25% of the weight (=4 units) is below and 75% (=12 units) is above. For your data, any number between 3 and 4 has that property.
For an interpretation of weighted percentiles, see the article "Weighted percentiles."
The basic idea is to sort the data in increasing order (your data are already sorted). Then add up the cumulative weights and take the percentiles of the total weight.
For your data, the weights sum to 16. The 50th percentile is therefore the data value for which half the weight is on one side and half is on the other. If you run down your data, you see that any number between 5 and 6 has half the weight (8 units) on both sides.
The other percentiles are similar. The 25th percentile is the data value for which 25% of the weight (=4 units) is below and 75% (=12 units) is above. For your data, any number between 3 and 4 has that property.
@Rick_SAS That is an amazinf article! So clera!
Why wont UNIVARIATE create any graphs? I tried to make a histogram but it complains the the graphs cant be create if I use a weight.
Weighted graphics are a complicated topic for which statisticians have not reached a consensus. However, if you want to visualize the weighted distribution, you can create a weighted empirical CDF, as shown in the article that I mentioned earlier. For your data, the weighted ECDF would look like this:
data Have;
input x w;
datalines;
1 1
2 2
3 1
4 2
5 2
6 4
7 3
8 1
;
run;
title "Weighted Percentiles";
/* put sum of weights into macro variable */
proc sql noprint;
select sum(w) into :sumWt from Have;
quit;
%put &=sumWt; /* display value in SAS log */
data Want;
set Have;
wt = w / &sumWt; /* standardize Sum(wt)=1 */
run;
proc means data=Want p25 median p75;
var x;
weight wt;
run;
/* use IML to form weighted ECDF from data */
proc iml;
use Want; read all var {x wt}; close;
cumWt = cusum(wt);
cutPts = 0 // cumWt;
/* generate data for WECDF */
t = do(0, 0.999, 0.001);
idx = bin(t, cutPts);
q = x[idx];
create WECDF var {t q x}; append; close;
QUIT;
title "Weighted ECDF";
proc sgplot data=wecdf noautolegend;
xaxis grid label="x";
yaxis grid offsetmin=0.1 label="Cumulative Proportion";
step x=q y=t;
fringe x / lineattrs=(color=black);
refline 0 / axis=y;
run;
You say your weighted median is 3.2? Explain how you calculated this.
Here's how SAS gets these values ... it uses observation 1 one time and observation 2 two times and observation 6 four times, etc.
As if the data set HAVE1 was provided instead of HAVE
data have1;
set have;
do i=1 to w;
output;
end;
drop i;
run;
Then, PROC UNIVARIATE on HAVE1 without the weight statement gives the same median as PROC UNIVARIATE on HAVE with the weight statement.
I did explain. I ran PROC UNIVARIATE.
@WeiChen wrote:
I did explain. I ran PROC UNIVARIATE.
More information is needed. What PROC UNIVARIATE code gives a median of 3.2 for data set HAVE?????
Oh, sorry. I used 3.2 as a hypothetical example. But then I decided to add example data and a PROC UNIVARIATE statement and forgot to update my sentenese. I will do that now.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.