BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
AndersS
Lapis Lazuli | Level 10

QUESTION to SAS:  Is my choice of QMARKERS   793 a good choice? Is it too big or too small?

                                       Where can I read about these numbers  “7, 25, 105” ?

 

QMARKERS= number - specifies the default number of markers to use for the P2 quantile estimation method. The number of markers controls the size of fixed memory space.

     The default value depends on which quantiles you request. For the median (P50), number is 7. For the quantiles (P25 and P50), number is 25. For the quantiles P1, P5, P10, P75 P90, P95, or P99, number is 105. If you request several quantiles, then PROC MEANS uses the largest value of number. An odd integer greater than 3.

      Increase the number of markers above the defaults settings to improve the accuracy of the estimate; reduce the number of markers to conserve memory and computing time.      Additions:  7 + 25 + 25 + 7*105 = 792 -> 793.

 

 

 

Anders Sköllermo (Skollermo in English)
1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I do not have a citation other than the SAS documentation.

 

The P^2 method is from the paper by Jain and Chlamtac (1985) https://dl.acm.org/doi/pdf/10.1145/4372.4378 

The paper uses 5 markers to compute the p_th quantile for any value of p. It also discusses using the method to construct histograms with B equiprobable bins by using B markers. The end of the paper states, "If many quantiles are desired, the calculation of complete histograms is more accurate and
computationally more efficient."  I suspect that this sentence motivated SAS R&D to look at the accuracy of the percentile estimates as the number of markers is varied.

 

I suspect SAS R&D did extensive testing and decided that 7 markers provided a more accurate and reliable estimate (as compared to 5 markers) for the 50th percentile (median) over a wide range of simulated data. For the 25th and 75th percentiles, the developers and testers probably concluded that 25 markers were required to provide accurate estimates. And so forth.

 

I was not involved in developing this, so my opinions are purely conjecture.

View solution in original post

6 REPLIES 6
Rick_SAS
SAS Super FREQ

An appropriate choice depends on the quantiles that you wish to estimate. Unless you are trying to compute extreme quantiles (for example, the 99.9999th percentile) or the distribution of your data has a very long tail, the default number of markers should suffice. 

 

Thus, I would say that QMARKERS=793 is often too large. 

 

In general, the number specifies the number of equiprobable bins that you want to use to estimate the quantiles. If you use Q as the value for the QMARKERS= option, you are essentially estimating the order statistics for

p=0, 1/Q, 2/Q, 3/Q, ..., 1

So the default value Q=105 should adequate estimates for percentiles as extreme as the 0.01th and 99.99th percentiles.

AndersS
Lapis Lazuli | Level 10

Hi Rick!   I want to calculate all the "usual quantiles" like q1, q5,q10, q25,q50, q75, q90, q95, q99 - is the values 015 then still "the most appropriate value" ?
("I have some ideas about an approximate method for quantiles. SIO I want to test in a serious way, Means and Univariate).

/ Br Anders   "Both Age and IQ are 74+"

Anders Sköllermo (Skollermo in English)
Rick_SAS
SAS Super FREQ

Yes, you can accept the default value of QMARKERS=105.

This value is should give good estimates in most situations where the tails of the data distribution are not extremely long.

AndersS
Lapis Lazuli | Level 10

Have you seen any reference to the stated values?
Where do these values come from?

/Br Anders

Anders Sköllermo (Skollermo in English)
Rick_SAS
SAS Super FREQ

I do not have a citation other than the SAS documentation.

 

The P^2 method is from the paper by Jain and Chlamtac (1985) https://dl.acm.org/doi/pdf/10.1145/4372.4378 

The paper uses 5 markers to compute the p_th quantile for any value of p. It also discusses using the method to construct histograms with B equiprobable bins by using B markers. The end of the paper states, "If many quantiles are desired, the calculation of complete histograms is more accurate and
computationally more efficient."  I suspect that this sentence motivated SAS R&D to look at the accuracy of the percentile estimates as the number of markers is varied.

 

I suspect SAS R&D did extensive testing and decided that 7 markers provided a more accurate and reliable estimate (as compared to 5 markers) for the 50th percentile (median) over a wide range of simulated data. For the 25th and 75th percentiles, the developers and testers probably concluded that 25 markers were required to provide accurate estimates. And so forth.

 

I was not involved in developing this, so my opinions are purely conjecture.

AndersS
Lapis Lazuli | Level 10
Many Thanks! I am very content with this answer. / Br Anders
Anders Sköllermo (Skollermo in English)

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1162 views
  • 3 likes
  • 2 in conversation