turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- proc univariate and output , Please Help..Thank Yo...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2014 11:18 AM

Hi,

I have run a Proc univariate on my data, and get the report below, I would like to have the volume that I get for each quantile or percentile (Report below). Your help will be much appreciated.

Many Thanks

Quantiles (Definition 5) | |

Quantile | Estimate |

100% Max | 0.1039 |

99% | 0.0468 |

95% | 0.0382 |

90% | 0.0341 |

75% Q3 | 0.0287 |

50% Median | 0.0237 |

25% Q1 | 0.0201 |

10% | 0.0179 |

5% | 0.0169 |

1% | 0.015 |

0% Min | 0.0102 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2014 11:19 AM

What's your question?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2014 11:42 AM

The volume in ecah percentile for example 0%, 1%, 5% , 10 % etc...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2014 11:56 AM

Hi, I don't understand either. When Reeza asks "What's your question?" and you say

The volume in ecah percentile for example 0%, 1%, 5% , 10 % etc...

This is not a question

Furthermore, you already have shown us the volume in each percentile, as computed by PROC UNIVARIATE, so you have the results. Problem solved, I think.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2014 12:08 PM

Sorry, maybe I didn't explain myself properly, I meant the actual volume...(Frequency)..for example you have first 0%, how many cutomers who fall in ecah percentile? For example when you rank your population into deciles , you will get volume like below, how do I get the volume using proc univariate (The actual number of cistomers)..Hope I make sense Many Thanks

decile | noresponse | response | Total Volume |

9 | 2,973 | 639 | 3,612 |

8 | 2,732 | 424 | 3,156 |

7 | 3,175 | 407 | 3,582 |

6 | 2,799 | 355 | 3,154 |

5 | 2,858 | 330 | 3,188 |

4 | 3,869 | 442 | 4,311 |

3 | 2,990 | 318 | 3,308 |

2 | 3,119 | 330 | 3,449 |

1 | 2,959 | 285 | 3,244 |

0 | 3,159 | 272 | 3,431 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2014 12:11 PM

What variable are you using to create your ranks?

I'd assume you'd run proc rank on your data followed by a proc freq to get that type of output, but you still haven't provided enough information.

Primarily what your data looks like, how you're assigning ranks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2014 12:32 PM

Basically, my data is just probabilities per customer, I wnat to check the distribution of the probabilities.....and the proc univariate gives you the report below, but i will aslo want to see the volume....(actual number of customers who fall in each percentile of quantile)

My raw data

customer_id | Probability |

1002 | 0.05 |

1003 | 0.6 |

1004 | 0.8 |

1005 | 0.02 |

1006 | 0.03 |

1007 | 0.5 |

1008 | 0.6 |

Desired Output

Quantiles (Definition 5) | ||

Quantile | Estimate | Volume |

100% Max | 0.025 | ? |

99% | 0.014 | ? |

95% | 0.0099 | ? |

90% | 0.0074 | ? |

75% Q3 | 0.0037 | ? |

50% Median | 0.002 | ? |

25% Q1 | 0.0015 | ? |

10% | 0.0012 | ? |

5% | 0.0011 | ? |

1% | 0.0009 | ? |

0% Min | 0.0008 | ? |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2014 12:41 PM

Defined as how?

Check the definition of percentiles:Percentile - Wikipedia, the free encyclopedia

99% percentile reflects the data point where 99% are less than this number, you seem want to bin the numbers instead.

If you want to bin the numbers use proc rank and then proc freq as indicated.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2014 12:55 PM

Hi Reeza,

So 99% of my population, let's say I have a Million of customers...so it means that 990,000 have a probability less than 0.014 for example...and if I want to know the volume of customers the 1% percentile (data point where 1% is less than 0.0009) , I just run a count(*) on customers who have less than 0.0009 and greater than 0.0008. Am I right? Thank you

Quantiles (Definition 5) | ||

Quantile | Estimate | Volume |

100% Max | 0.025 | ? |

99% | 0.014 | ? |

95% | 0.0099 | ? |

90% | 0.0074 | ? |

75% Q3 | 0.0037 | ? |

50% Median | 0.002 | ? |

25% Q1 | 0.0015 | ? |

10% | 0.0012 | ? |

5% | 0.0011 | ? |

1% | 0.0009 | ? |

0% Min | 0.0008 | ? |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2014 12:58 PM

What would that represent though? It just tells you how many people have between 0.009 and 0.008?

What's the definition of that metric?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2014 01:53 PM

Isn't it just roughly .99 * your N?

Otherwise output the percentiles from Proc Univariate and use them to bin your variable (?? proc score ??) or to create a format and run the data again through a PROC FREQ with the new binned value or formatted value.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-10-2014 02:40 PM

Sounds like what you want is just the number of cases with values between the first and second percentiles, the second and third percentiles and so on. By definition, this is always just 1% of the total number of (non-missing) cases in the file. It might be slightly above or below this if the data are not truly continuous or is weighted. In your decile example, the volume numbers are all around 10% of the sample size. Unless you are trying to look at the impact of ties in the data, it is not a particularly interesting statistic.