06-24-2017 06:34 PM - edited 06-24-2017 07:52 PM
I have a sample data attached with 100 observations (real data more than 100 obs).
What I try to do is to compute different percentiles (eg., p1, p15, p25, p50, p90, p99) without sorting the data (so I want to compute the percentiles based on the original order of the data).
When I apply the PROC UNIVARIATE or other procedures, the data is automatically sorted; so the percentiles computed by those procedures are based on sorted data (which is not I want). Thank you for your idea!
06-24-2017 07:22 PM
Even if proc univariate sorted the data it wouldn't affect your data since its not stored.
PROC RANK is the other common option but it probably sorts as well. I think you can probably not sort the data but it would load all into memory, how big is your actual data?
06-24-2017 07:48 PM
You are right, the original data will not be sorted, but the output is based on the sorted data.
About the data size, it depends on different variables, somewhere between 1000 to 50,000 obs.
06-24-2017 08:34 PM
I might be missing something but I don't understand why the order of the values in your data source would impact on the percentile calculation and I can't find anything in the documentation which indicates otherwise.
Is this about the calculation as such or is this about you wanting to back-merge the results to your original dataset adding a percentile column to each source value?
May be provide a sample of a desired result using the Have dataset you've already posted.
06-24-2017 10:51 PM
so I have a paire of variables Var1 and Var2, the sorting is by Var1; hence after the Var1 is sorted, the order of Var2 is fixed. I would say that the "percentile" I mentioned here is no longer the traditional percentile coming from sorted values.
06-24-2017 10:57 PM
Then you need to do it manually, here's how you'd calculate the percentile of each obs. It's actually not clear what you want, espeically if you want P5, what would that look like exactly?
data class; set sashelp.class nobs=sample_size; percentile = _n_ / sample_size; run;
06-24-2017 11:03 PM
Thank you for your input. the percentile I was asked to compute including p1, p2, p3, p4, p5, p10, p50, p90, p95, p96, p97, p98, p99. The task on hand required me to do so, and I not sure how to handle it in SAS by proc univariate and other procedures; so I ask in this community. Again, I am sorry and didn't mean to wast anyone's time.
06-24-2017 11:07 PM
If the solution posted isn't enough to help you move forward please post a fully worked example. We need to be able to see what you have and what you need and based off what you're saying these approaches mentioned or shown should work - ie calculate manually, or add a sort variable and calculate percentiles.
06-24-2017 10:52 PM
For 50K obs why even care to be honest? If the original order matters that much, add a sort variable and resort afterwards.
You've probably spent more time thinking about this than it would take to sort.*
*Yes, I understand the desire to improve a process and make it more efficient for the additional knowledge. But if you're asking such a question please indicate so at the beginning so we can decide whether it's worth our time to answer it.
06-24-2017 10:58 PM - edited 06-24-2017 11:07 PM
Thank you for all of your time. I didn't mean to waste anyone's time. Just I am handing the work that asking me to find the percentile of a variable that is connected to another sorted variable, as I reply earlier. If my dumb question bothers anyone, I am sorry for asking.
06-24-2017 11:17 PM
@Jonate_H: No one has said that is's a dumb question. To the contrary, no one knows what you are trying to do! Others have asked you already, but you still haven't provided the output you'd like given you 100 record sample. Given that it will be a lot easier to answer your question.
Art, CEO, AnalystFinder.com
06-24-2017 11:55 PM
There's also the problem of editing an original post. If you added the data or edit after I've initially read it, it doesn't occur to me to look at original post again. FYI - none of this meant that this isn't a valid question - it can be, but it needs to be clearly laid out.
Need further help from the community? Please ask a new question.