BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jonate_H
Quartz | Level 8

 

I have a sample data attached with 100 observations (real data more than 100 obs).
What I try to do is to compute different percentiles (eg., p1, p15, p25, p50, p90, p99) without sorting the data (so I want to compute the percentiles based on the original order of the data).

When I apply the PROC UNIVARIATE or other procedures, the data is automatically sorted; so the percentiles computed by those procedures are based on sorted data (which is not I want). Thank you for your idea!

 

1 ACCEPTED SOLUTION

Accepted Solutions
12 REPLIES 12
Reeza
Super User

Even if proc univariate sorted the data it wouldn't affect your data since its not stored. 

 

PROC RANK is the other common option but it probably sorts as well. I think you can probably not sort the data but it would load all into memory, how big is your actual data?

Jonate_H
Quartz | Level 8

@Reeza

You are right, the original data will not be sorted, but the output is based on the sorted data.

About the data size, it depends on different variables, somewhere between 1000 to 50,000 obs.

 

Patrick
Opal | Level 21

@Jonate_H

I might be missing something but I don't understand why the order of the values in your data source would impact on the percentile calculation and I can't find anything in the documentation which indicates otherwise.

http://support.sas.com/documentation/cdl/en/procstat/66703/HTML/default/viewer.htm#procstat_univaria...

 

Is this about the calculation as such or is this about you wanting to back-merge the results to your original dataset adding a percentile column to each source value?

 

May be provide a sample of a desired result using the Have dataset you've already posted.

 

Jonate_H
Quartz | Level 8

@Patrick

so I have a paire of variables Var1 and Var2, the sorting is by Var1; hence after the Var1 is sorted, the order of Var2 is fixed. I would say that the "percentile" I mentioned here is no longer the traditional percentile coming from sorted values.

Reeza
Super User

Then you need to do it manually, here's how you'd calculate the percentile of each obs. It's actually not clear what you want, espeically if you want P5, what would that look like exactly?

 

data class;
    set sashelp.class nobs=sample_size;
    percentile = _n_ / sample_size;
run;
Jonate_H
Quartz | Level 8

@Reeza

Thank you for your input. the percentile I was asked to compute including p1, p2, p3, p4, p5, p10, p50, p90, p95, p96, p97, p98, p99. The task on hand required me to do so, and I not sure how to handle it in SAS by proc univariate and other procedures;  so I ask in this community. Again, I am sorry and didn't mean to wast anyone's time.

Reeza
Super User

If the solution posted isn't enough to help you move forward please post a fully worked example. We need to be able to see what you have and what you need and based off what you're saying these approaches mentioned or shown should work - ie calculate manually, or add a sort variable and calculate percentiles. 

 

 

Reeza
Super User

For 50K obs why even care to be honest? If the original order matters that much, add a sort variable and resort afterwards. 

You've probably spent more time thinking about this than it would take to sort.* 

 

 

*Yes, I understand the desire to improve a process and make it more efficient for the additional knowledge. But if you're asking such a question please indicate so at the beginning so we can decide whether it's worth our time to answer it. 

 

Jonate_H
Quartz | Level 8

@Reeza

Thank you for all of your time. I didn't mean to waste anyone's time. Just I am handing the work that asking me to find the percentile of a variable that is connected to another sorted variable, as I reply earlier. If my dumb question bothers anyone, I am sorry for asking.

art297
Opal | Level 21

@Jonate_H: No one has said that is's a dumb question. To the contrary, no one knows what you are trying to do! Others have asked you already, but you still haven't provided the output you'd like given you 100 record sample. Given that it will be a lot easier to answer your question.

 

Art, CEO, AnalystFinder.com

 

Reeza
Super User

There's also the problem of editing an original post. If you added the data or edit after I've initially read it, it doesn't occur to me to look at original post again. FYI - none of this meant that this isn't a valid question - it can be, but it needs to be clearly laid out. 

Jonate_H
Quartz | Level 8

Thank you all.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 3951 views
  • 2 likes
  • 4 in conversation