Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How do I Calculate Percentiles for Non-Numeric data

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-31-2019 04:35 AM
(1013 views)

Good morning all

I have a censored data set as below and would like to calculate the 50th, 93rd, 95th and 99th percentiles. The data below has even count and may vary at times to an odd count.

I am using SAS 9.4 and Enterprise Guide 7.1

Thanking you in advance

The data is as below:

data Aluminium;

input Name $ Result $;

cards;

Al <5

Al <15

Al <5

Al 28

Al <5

Al ❤️

Al <12

Al <19

Al <2

Al <12

Al <5

Al <15

Al <5

Al 25

Al <5

Al ❤️

Al <12

Al <19

Al <2

Al <12

Al <5

Al <15

Al <5

Al 28

Al <5

Al ❤️

Al <12

Al <19

Al <2

Al <12

Al <5

Al <15

Al <5

Al 25

Al <5

Al ❤️

Al <12

Al <19

Al <2

Al <12

;

run;

5 REPLIES 5

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I think you can do this by using survival analysis, but I am not an expert in that area. You need to create a binary indicator variable that specifies whether the time was observed or censored. You then create a numerical value from the remaining part of the Result string.

```
data Have;
set Aluminium;
censored = (substr(Result, 1,1)='<');
w = scan(Result, -1, "<"); /* scans from the right */
t = input(w, best.);
drop w;
run;
```

After you get the data in this form, look at PROC LIFETEST to analyze the data. For example, the following basic analysis give the 25th, 50th, and 75th percentiles of the survival time. I do not know the options to get a table of the percentiles that you want, although you can read it off the graph of the survival probability:

```
ods graphics on;
proc lifetest data=B;
time t*Censored(1);
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Good morning SAS Super FREQ

The first part of the solution gave some clue. Thank you for the quick response

Regards

MMohotsi

The first part of the solution gave some clue. Thank you for the quick response

Regards

MMohotsi

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

The main problem is that you don't really have a defined point in time, it's more a step type function? I would start by doing a PROC FREQ to get the counts of each level. The percentiles will be based on the frequency table cumulative percents.

You could try a censored approach, but I think in this case the math will work out the same.

You could try a censored approach, but I think in this case the math will work out the same.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@Reeza wrote:

The main problem is that you don't really have a defined point in time, it's more a step type function? I would start by doing a PROC FREQ to get the counts of each level. The percentiles will be based on the frequency table cumulative percents.

You could try a censored approach, but I think in this case the math will work out the same.

But if you want your result to be a particular order you may need to modify values as <10 will come before <5 when using character values. And the mix of < and integer is very problematic.

Consider the following proc freq output from your example data step:

Cumulative Cumulative Result Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 25 2 5.00 2 5.00 28 2 5.00 4 10.00 <12 8 20.00 12 30.00 <15 4 10.00 16 40.00 <19 4 10.00 20 50.00 <2 4 10.00 24 60.00 <3 4 10.00 28 70.00 <5 12 30.00 40 100.00

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Good day

I think the challenge with this approach is the order of the field "Result'. Combining the response from Rick_SAS and the two latest might have a light at the end of the tunnel.

I think the challenge with this approach is the order of the field "Result'. Combining the response from Rick_SAS and the two latest might have a light at the end of the tunnel.

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 25. Read more here about **why** you should contribute and **what is in it** for you!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.