compute percentiles without sorting

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 130
Accepted Solution

compute percentiles without sorting

[ Edited ]

 

I have a sample data attached with 100 observations (real data more than 100 obs).
What I try to do is to compute different percentiles (eg., p1, p15, p25, p50, p90, p99) without sorting the data (so I want to compute the percentiles based on the original order of the data).

When I apply the PROC UNIVARIATE or other procedures, the data is automatically sorted; so the percentiles computed by those procedures are based on sorted data (which is not I want). Thank you for your idea!

 

Attachment

Accepted Solutions
Solution
‎06-24-2017 11:41 PM
Frequent Contributor
Posts: 130

Re: compute percentiles without sorting


All Replies
Super User
Posts: 17,745

Re: compute percentiles without sorting

Even if proc univariate sorted the data it wouldn't affect your data since its not stored. 

 

PROC RANK is the other common option but it probably sorts as well. I think you can probably not sort the data but it would load all into memory, how big is your actual data?

Frequent Contributor
Posts: 130

Re: compute percentiles without sorting

@Reeza

You are right, the original data will not be sorted, but the output is based on the sorted data.

About the data size, it depends on different variables, somewhere between 1000 to 50,000 obs.

 

Respected Advisor
Posts: 3,886

Re: compute percentiles without sorting

@Jonate_H

I might be missing something but I don't understand why the order of the values in your data source would impact on the percentile calculation and I can't find anything in the documentation which indicates otherwise.

http://support.sas.com/documentation/cdl/en/procstat/66703/HTML/default/viewer.htm#procstat_univaria...

 

Is this about the calculation as such or is this about you wanting to back-merge the results to your original dataset adding a percentile column to each source value?

 

May be provide a sample of a desired result using the Have dataset you've already posted.

 

Frequent Contributor
Posts: 130

Re: compute percentiles without sorting

@Patrick

so I have a paire of variables Var1 and Var2, the sorting is by Var1; hence after the Var1 is sorted, the order of Var2 is fixed. I would say that the "percentile" I mentioned here is no longer the traditional percentile coming from sorted values.

Super User
Posts: 17,745

Re: compute percentiles without sorting

Then you need to do it manually, here's how you'd calculate the percentile of each obs. It's actually not clear what you want, espeically if you want P5, what would that look like exactly?

 

data class;
    set sashelp.class nobs=sample_size;
    percentile = _n_ / sample_size;
run;
Frequent Contributor
Posts: 130

Re: compute percentiles without sorting

@Reeza

Thank you for your input. the percentile I was asked to compute including p1, p2, p3, p4, p5, p10, p50, p90, p95, p96, p97, p98, p99. The task on hand required me to do so, and I not sure how to handle it in SAS by proc univariate and other procedures;  so I ask in this community. Again, I am sorry and didn't mean to wast anyone's time.

Super User
Posts: 17,745

Re: compute percentiles without sorting

If the solution posted isn't enough to help you move forward please post a fully worked example. We need to be able to see what you have and what you need and based off what you're saying these approaches mentioned or shown should work - ie calculate manually, or add a sort variable and calculate percentiles. 

 

 

Super User
Posts: 17,745

Re: compute percentiles without sorting

For 50K obs why even care to be honest? If the original order matters that much, add a sort variable and resort afterwards. 

You've probably spent more time thinking about this than it would take to sort.* 

 

 

*Yes, I understand the desire to improve a process and make it more efficient for the additional knowledge. But if you're asking such a question please indicate so at the beginning so we can decide whether it's worth our time to answer it. 

 

Frequent Contributor
Posts: 130

Re: compute percentiles without sorting

[ Edited ]

@Reeza

Thank you for all of your time. I didn't mean to waste anyone's time. Just I am handing the work that asking me to find the percentile of a variable that is connected to another sorted variable, as I reply earlier. If my dumb question bothers anyone, I am sorry for asking.

PROC Star
Posts: 7,356

Re: compute percentiles without sorting

@Jonate_H: No one has said that is's a dumb question. To the contrary, no one knows what you are trying to do! Others have asked you already, but you still haven't provided the output you'd like given you 100 record sample. Given that it will be a lot easier to answer your question.

 

Art, CEO, AnalystFinder.com

 

Super User
Posts: 17,745

Re: compute percentiles without sorting

There's also the problem of editing an original post. If you added the data or edit after I've initially read it, it doesn't occur to me to look at original post again. FYI - none of this meant that this isn't a valid question - it can be, but it needs to be clearly laid out. 

Solution
‎06-24-2017 11:41 PM
Frequent Contributor
Posts: 130

Re: compute percentiles without sorting

Thank you all.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 251 views
  • 2 likes
  • 4 in conversation