Programming the statistical procedures from SAS

How to use proc univariate to identify pecentile points when negative values present in dataset?

Accepted Solution Solved
Reply
Super Contributor
Posts: 338
Accepted Solution

How to use proc univariate to identify pecentile points when negative values present in dataset?

Hi SAS Forum,

I have the attahced data set with a single variable.

I wanted to get 10%,20%...90% & 100% percentile points.

I used proc univariate with output statement below (zilok is thankfully acknowledged) to get 10%,20%...90% & 100% percentile points.

SAS Code:

proc univariate data = have;

var income;

output out=anyname pctlpts = 10 to 100 by 10 pctlpre = inc;

run;

Q: The code produces 10% …to 100% percentile points but the negative values in the data set have been completely ignored for the percentile points creation, it seems (please run the code and see the output data set named "anyname"). Is there any method to force the SAS code to consider negative values too into percentile points creation?

Thanks

Mirisage

Attachment

Accepted Solutions
Solution
‎09-13-2013 05:54 PM
Super User
Posts: 10,857

Re: How to use proc univariate to identify pecentile points when negative values present in dataset?

You have 33 non missing values with 3 negative. The 10th percentile would be the 3.3th element which I believe SAS rounds up by default to the 4th nonmissing for the 10th percentile, which is 0. The percentiles are going to be one of the data values not interpolated.

One clue, the 5th percentile in the univariate output is -9 when the desired value would be the 1.65th, rounded up to 2nd value = -9. Which would hint that the 4th is going to be the 10th percentile.

Upshot, you don't have enough negative values for one to be at or above the 10th percentile.

View solution in original post


All Replies
Super User
Posts: 18,549

Re: How to use proc univariate to identify pecentile points when negative values present in dataset?

What makes you think they're not being considered?

From my test they are:

data random;

    do i=-1000 to 1000 by 10;

        output;

    end;

run;

proc univariate data = random;

var i;

output out=anyname pctlpts = 10 to 100 by 10 pctlpre = inc;

run;

Super Contributor
Posts: 338

Re: How to use proc univariate to identify pecentile points when negative values present in dataset?

Hi Reeza,

Thanks.

I ran the proc univariate code for your data set. Yes, it worked.

However, please try to apply proc univariate code for the data set provided below. It returns the following output which omits negative values.

0

0

0

14

41

52

125

190

360

393

/*this is the data set*/

data have;

input income;

cards;

0

15

392

219

95

.

208

-10

12

41

22

65

372

360

0

0

0

393

190

0

0

0

168

93

-8

0

14

43

138

52

125

0

-9

45

;

run;

proc univariate data = have;

var income;

output out=anyname pctlpts = 10 to 100 by 10 pctlpre = inc;

run;

proc print;

Why proc univariate did not consider negative subjects?

Would appreciate any clue?

Thanks

Mirisage

Solution
‎09-13-2013 05:54 PM
Super User
Posts: 10,857

Re: How to use proc univariate to identify pecentile points when negative values present in dataset?

You have 33 non missing values with 3 negative. The 10th percentile would be the 3.3th element which I believe SAS rounds up by default to the 4th nonmissing for the 10th percentile, which is 0. The percentiles are going to be one of the data values not interpolated.

One clue, the 5th percentile in the univariate output is -9 when the desired value would be the 1.65th, rounded up to 2nd value = -9. Which would hint that the 4th is going to be the 10th percentile.

Upshot, you don't have enough negative values for one to be at or above the 10th percentile.

Contributor
Posts: 28

Re: How to use proc univariate to identify pecentile points when negative values present in dataset?

Hi,

I know this post is old. I am in similar situation. I am getting the following values. There are many 0 values in the data and I have converted the negative values to 0. I just need to get the correct percentiles.

There are values less than 3.36 and they have been ignored. Please let me know how to fix it.

Trusted Advisor
Posts: 1,670

Re: How to use proc univariate to identify pecentile points when negative values present in dataset?

Why do you think this result is not correct? Unless we can see your data, there's really no way to help (and if you read ballardw's explanation above, it certainly could be correct)

Super User
Posts: 10,857

Re: How to use proc univariate to identify pecentile points when negative values present in dataset?

Run Proc Freq on that variable and look at the cumulative percent column. I'm willing to be that 60 percent of the values are 0 from the displayed values you show.

Define your rule for "correct percentiles". If you want percentiles of values greater than 0 then either use a data set option to restrict input data as such OR set the 0, or original negative values, to missing.

Contributor
Posts: 28

Re: How to use proc univariate to identify pecentile points when negative values present in dataset?

Hello PaigeMiller and Ballardw,


Thanks a lost for quick response. You are correct 69% of the values are 0. I have used the above posted data as an example(modified per my data). I have converted the missing and negative values to zero and I am trying to get deciles from that. I need to use the missing values for counting the ids and have missing as one of the groups. hence I have converted . to 0. I can use Proc Rank with group=10 for the same, but as it eliminates the missing values, I tried with univariate.


Please provide and example on how to define the ruler.


data have;

input income;

cards;

0

0.05

3.92

21.9

9.5

0

20.8

0.06

1.12

4.13

22.90

65.18

37.2

36.07

0

0

0

393.09

190.88

0

0

0

16.8

9.3

0

0

14.35

4.3

1.38

52.23

12.5

0

0

4.5

;

run;

proc univariate data = abcd;

var dollar;

output out=decile pctlpts = 10 to 100 by 10  pctlpre = pct;

run;


after this I am doing the following.


data _null_;

    set &decilevar;

    call symput('d0',pct0);

    call symput('d1',pct10);

    call symput('d2',pct20);

    call symput('d3',pct30);

    call symput('d4',pct40);

    call symput('d5',pct50);

    call symput('d6',pct60);

    call symput('d7',pct70);

    call symput('d8',pct80);

    call symput('d9',pct90);

    call symput('d10',pct100);

run;

proc format;

    value abc (multilabel)

                           0    = '10'

                     &d1 - &d2  = '9'

                     &d2 - &d3  = '8'

                     &d3 - &d4  = '7'

                     &d4 - &d5  = '6'

                     &d5 - &d6  = '5'

                     &d6 - &d7  = '4'

                     &d7 - &d8  = '3'

                     &d8 - &d9  = '2'

                     &d9 - &d10 = '1'

     ;

run;

/* create the decile in the main dataset it self*/

data abc;

  set &dsn;

    format &var abc.;

run;


Output I need shouldlook like this. Sum the data and id based on the decile group.


Decile    Dollar

1        $0123,456,789 

2        $ 123,456,789         

3        $  23,456,789

4        23,456,789

5                23,456,789

6        $ 216,456,789

7        $ 23,456,789

8        23,456,789

9        23,456,789

10       $         -


What I am unable to understand is I am getting the following 11 groups, when I dont convert the missing and negative values to 0.


SAS Super FREQ
Posts: 3,547

Re: How to use proc univariate to identify pecentile points when negative values present in dataset?

All these "issues" are the same: When using data that has a large number of values that are exactly equal, you can get two or more decile values that are the same. An example and explanation is provided at Binning data by quantiles? Beware of rounded data - The DO Loop. In short, the problem is the data, not the software.  At the end of the article are two suggestions for ways to work around the problem.

Contributor
Posts: 28

Re: How to use proc univariate to identify pecentile points when negative values present in dataset?

Hello Rick,

Thank you very much for your response on this. I agree with you 100%. Its the issue with the data Smiley Happy. I will let you know how it goes.

Thanks,

Jeeth.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 1699 views
  • 0 likes
  • 6 in conversation