BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
WeiChen
Obsidian | Level 7

I know PROC UNIVARIATE willgive me weighted percentiles, but how do I interpret the results? If the weighted median is 5.5, what does that say about my data?  I guess I don't understand how the weights are effecting the stiatistics.

 

Here is an example:

data Have;
input x w;
datalines;
1 1
2 2
3 1
4 2
5 2
6 4
7 3
8 1
;
run;

proc univariate data=Have;
var x;
weight w;
ods select quantiles;
run;

 

When I run this code I get 

Q3=6.5
Median=5.5
Q1=3.5

 

I don't get these values. Why is the median 5.5? And the other quartiles?

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

For an interpretation of weighted percentiles, see the article "Weighted percentiles."

 

The basic idea is to sort the data in increasing order (your data are already sorted). Then add up the cumulative weights and take the percentiles of the total weight.

 

For your data, the weights sum to 16.  The 50th percentile is therefore the data value for which half the weight is on one side and half is on the other. If you run down your data, you see that any number between 5 and 6 has half the weight (8 units) on both sides.

 

The other percentiles are similar. The 25th percentile is the data value for which 25% of the weight (=4 units) is below and 75% (=12 units) is above. For your data, any number between 3 and 4 has that property.

 

View solution in original post

7 REPLIES 7
Rick_SAS
SAS Super FREQ

For an interpretation of weighted percentiles, see the article "Weighted percentiles."

 

The basic idea is to sort the data in increasing order (your data are already sorted). Then add up the cumulative weights and take the percentiles of the total weight.

 

For your data, the weights sum to 16.  The 50th percentile is therefore the data value for which half the weight is on one side and half is on the other. If you run down your data, you see that any number between 5 and 6 has half the weight (8 units) on both sides.

 

The other percentiles are similar. The 25th percentile is the data value for which 25% of the weight (=4 units) is below and 75% (=12 units) is above. For your data, any number between 3 and 4 has that property.

 

WeiChen
Obsidian | Level 7

@Rick_SAS That is an amazinf article! So clera!

 

Why wont UNIVARIATE create any graphs? I tried to make a histogram but it complains the the graphs cant be create if I use a weight.

Rick_SAS
SAS Super FREQ

Weighted graphics are a complicated topic for which statisticians have not reached a consensus. However, if you want to visualize the weighted distribution, you can create a weighted empirical CDF, as shown in the article that I mentioned earlier. For your data, the weighted ECDF would look like this:

 

data Have;
input x w;
datalines;
1 1
2 2
3 1
4 2
5 2
6 4
7 3
8 1
;
run;

title "Weighted Percentiles";

/* put sum of weights into macro variable */
proc sql noprint;                              
 select sum(w) into :sumWt from Have;
quit;
%put &=sumWt;   /* display value in SAS log */

data Want;
set Have;
wt = w / &sumWt;   /* standardize Sum(wt)=1 */
run;

proc means data=Want p25 median p75;
var x;
weight wt;
run;

/* use IML to form weighted ECDF from data */
proc iml;
use Want; read all var {x wt}; close;
cumWt = cusum(wt);
cutPts = 0 // cumWt; 

/* generate data for WECDF */
t = do(0, 0.999, 0.001);
idx = bin(t, cutPts);
q = x[idx];

create WECDF var {t q x}; append; close;
QUIT;

title "Weighted ECDF";
proc sgplot data=wecdf noautolegend;
xaxis grid label="x";
yaxis grid offsetmin=0.1 label="Cumulative Proportion";
step x=q y=t;
fringe x / lineattrs=(color=black);
refline 0 / axis=y;
run;
PaigeMiller
Diamond | Level 26

You say your weighted median is 3.2? Explain how you calculated this.

 

Here's how SAS gets these values ... it uses observation 1 one time and observation 2 two times and observation 6 four times, etc.

 

As if the data set HAVE1 was provided instead of HAVE

 

data have1;
    set have;
    do i=1 to w;
        output;
    end;
    drop i;
run;

Then, PROC UNIVARIATE on HAVE1 without the weight statement gives the same median as PROC UNIVARIATE on HAVE with the weight statement.

--
Paige Miller
WeiChen
Obsidian | Level 7

I did explain. I ran PROC UNIVARIATE. 

PaigeMiller
Diamond | Level 26

@WeiChen wrote:

I did explain. I ran PROC UNIVARIATE. 


More information is needed. What PROC UNIVARIATE code gives a median of 3.2 for data set HAVE????? 

--
Paige Miller
WeiChen
Obsidian | Level 7

Oh, sorry. I used 3.2 as a hypothetical example. But then I decided to add example data and a PROC UNIVARIATE statement and forgot to update my sentenese. I will do that now.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 576 views
  • 7 likes
  • 3 in conversation