BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mirisage
Obsidian | Level 7

Hi SAS Forum,

I have the attahced data set with a single variable.

I wanted to get 10%,20%...90% & 100% percentile points.

I used proc univariate with output statement below (zilok is thankfully acknowledged) to get 10%,20%...90% & 100% percentile points.

SAS Code:

proc univariate data = have;

var income;

output out=anyname pctlpts = 10 to 100 by 10 pctlpre = inc;

run;

Q: The code produces 10% …to 100% percentile points but the negative values in the data set have been completely ignored for the percentile points creation, it seems (please run the code and see the output data set named "anyname"). Is there any method to force the SAS code to consider negative values too into percentile points creation?

Thanks

Mirisage

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

You have 33 non missing values with 3 negative. The 10th percentile would be the 3.3th element which I believe SAS rounds up by default to the 4th nonmissing for the 10th percentile, which is 0. The percentiles are going to be one of the data values not interpolated.

One clue, the 5th percentile in the univariate output is -9 when the desired value would be the 1.65th, rounded up to 2nd value = -9. Which would hint that the 4th is going to be the 10th percentile.

Upshot, you don't have enough negative values for one to be at or above the 10th percentile.

View solution in original post

9 REPLIES 9
Reeza
Super User

What makes you think they're not being considered?

From my test they are:

data random;

    do i=-1000 to 1000 by 10;

        output;

    end;

run;

proc univariate data = random;

var i;

output out=anyname pctlpts = 10 to 100 by 10 pctlpre = inc;

run;

Mirisage
Obsidian | Level 7

Hi Reeza,

Thanks.

I ran the proc univariate code for your data set. Yes, it worked.

However, please try to apply proc univariate code for the data set provided below. It returns the following output which omits negative values.

0

0

0

14

41

52

125

190

360

393

/*this is the data set*/

data have;

input income;

cards;

0

15

392

219

95

.

208

-10

12

41

22

65

372

360

0

0

0

393

190

0

0

0

168

93

-8

0

14

43

138

52

125

0

-9

45

;

run;

proc univariate data = have;

var income;

output out=anyname pctlpts = 10 to 100 by 10 pctlpre = inc;

run;

proc print;

Why proc univariate did not consider negative subjects?

Would appreciate any clue?

Thanks

Mirisage

ballardw
Super User

You have 33 non missing values with 3 negative. The 10th percentile would be the 3.3th element which I believe SAS rounds up by default to the 4th nonmissing for the 10th percentile, which is 0. The percentiles are going to be one of the data values not interpolated.

One clue, the 5th percentile in the univariate output is -9 when the desired value would be the 1.65th, rounded up to 2nd value = -9. Which would hint that the 4th is going to be the 10th percentile.

Upshot, you don't have enough negative values for one to be at or above the 10th percentile.

jeeth79usa
Calcite | Level 5

Hi,

I know this post is old. I am in similar situation. I am getting the following values. There are many 0 values in the data and I have converted the negative values to 0. I just need to get the correct percentiles.

There are values less than 3.36 and they have been ignored. Please let me know how to fix it.

PaigeMiller
Diamond | Level 26

Why do you think this result is not correct? Unless we can see your data, there's really no way to help (and if you read ballardw's explanation above, it certainly could be correct)

--
Paige Miller
ballardw
Super User

Run Proc Freq on that variable and look at the cumulative percent column. I'm willing to be that 60 percent of the values are 0 from the displayed values you show.

Define your rule for "correct percentiles". If you want percentiles of values greater than 0 then either use a data set option to restrict input data as such OR set the 0, or original negative values, to missing.

jeeth79usa
Calcite | Level 5

Hello PaigeMiller and Ballardw,


Thanks a lost for quick response. You are correct 69% of the values are 0. I have used the above posted data as an example(modified per my data). I have converted the missing and negative values to zero and I am trying to get deciles from that. I need to use the missing values for counting the ids and have missing as one of the groups. hence I have converted . to 0. I can use Proc Rank with group=10 for the same, but as it eliminates the missing values, I tried with univariate.


Please provide and example on how to define the ruler.


data have;

input income;

cards;

0

0.05

3.92

21.9

9.5

0

20.8

0.06

1.12

4.13

22.90

65.18

37.2

36.07

0

0

0

393.09

190.88

0

0

0

16.8

9.3

0

0

14.35

4.3

1.38

52.23

12.5

0

0

4.5

;

run;

proc univariate data = abcd;

var dollar;

output out=decile pctlpts = 10 to 100 by 10  pctlpre = pct;

run;


after this I am doing the following.


data _null_;

    set &decilevar;

    call symput('d0',pct0);

    call symput('d1',pct10);

    call symput('d2',pct20);

    call symput('d3',pct30);

    call symput('d4',pct40);

    call symput('d5',pct50);

    call symput('d6',pct60);

    call symput('d7',pct70);

    call symput('d8',pct80);

    call symput('d9',pct90);

    call symput('d10',pct100);

run;

proc format;

    value abc (multilabel)

                           0    = '10'

                     &d1 - &d2  = '9'

                     &d2 - &d3  = '8'

                     &d3 - &d4  = '7'

                     &d4 - &d5  = '6'

                     &d5 - &d6  = '5'

                     &d6 - &d7  = '4'

                     &d7 - &d8  = '3'

                     &d8 - &d9  = '2'

                     &d9 - &d10 = '1'

     ;

run;

/* create the decile in the main dataset it self*/

data abc;

  set &dsn;

    format &var abc.;

run;


Output I need shouldlook like this. Sum the data and id based on the decile group.


Decile    Dollar

1        $0123,456,789 

2        $ 123,456,789         

3        $  23,456,789

4        23,456,789

5                23,456,789

6        $ 216,456,789

7        $ 23,456,789

8        23,456,789

9        23,456,789

10       $         -


What I am unable to understand is I am getting the following 11 groups, when I dont convert the missing and negative values to 0.


Rick_SAS
SAS Super FREQ

All these "issues" are the same: When using data that has a large number of values that are exactly equal, you can get two or more decile values that are the same. An example and explanation is provided at Binning data by quantiles? Beware of rounded data - The DO Loop. In short, the problem is the data, not the software.  At the end of the article are two suggestions for ways to work around the problem.

jeeth79usa
Calcite | Level 5

Hello Rick,

Thank you very much for your response on this. I agree with you 100%. Its the issue with the data Smiley Happy. I will let you know how it goes.

Thanks,

Jeeth.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 10252 views
  • 0 likes
  • 6 in conversation