BookmarkSubscribeRSS Feed
Csands
Calcite | Level 5

Hi,

I did a proc univariate for a variable, and got in the extreme value tables and in the quantiles table different values for the maximum and minimum of the variable. As shown below.  

Extreme Values

Lowest

Highest

Order

Value

Freq

Order

Value

Freq

1

-778.998.059

1

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

270756

3.990.158.000

1

Quantiles

Quantile

Estimate

100% Max

3.990.160.000

99%

1.190.450

95%

210.926

90%

97.597

75% Q3

29.182

50% Median

                           6.771

25% Q1

-2.523

10%

-17.403

5%

-36.202

1%

-174.758

0% Min

-778.998.000

Why are this values different?

Thanks in advance for the help.

11 REPLIES 11
PGStats
Opal | Level 21

That's a very good question. How many observations are in your dataset? Did you try playing with UNIVARIATE option PCTLDEF= ?

PG
Reeza
Super User

Given that FREQ=1 for each of them would it matter?

Which ones correct by the way?

PGStats
Opal | Level 21

I would bet the extreme values are exact. They are easy to get in a single pass of the dataset. Not the same for quantile estimates (other than Min and Max). They look like rounded values that most likely result from some binning algorithm. - PG

PG
ballardw
Super User

I'm willing to bet that the difference is the default table format for the two tables, probably a difference between best9 or best 10 and best 12.


PGStats
Opal | Level 21

Well, that hypothesis could be checked by exporting the tables to ODS datasets. - PG

PG
Csands
Calcite | Level 5

So....

The code I use is very simple.

PROC UNIVARIATE DATA = table NEXTRVAL=30;

BY byvar;

VAR var_to_study;

RUN;

I run this code for more than 30 continuous variables (as the var are continuos is ok to have a freq=1). For almost all the variables with values of more than 6 digits I have this issue. My variables have no formats, just a length 8 associated. The dataset I use as more than 500.000 rows, the results I showed before belong to a by group of more than 200.000 rows, and I can have missing values.

I already calculated a max and min with proc sql, and the results where the same as the extreme values table.

Of course I have several ways of getting the max and min .... but I was expecting to get the same values in all the methodologies.

  

Reeza
Super User

What version of SAS are you on? I can't replicate the issue on SAS 9.3

I think I agree with BallardW that it's probably just a format issue with the template.

Add in this line before your code and see if the values in the dataset xvalues are what you expect:

ods table extremevalues=xvalues;

PGStats
Opal | Level 21

I can't reproduce the "Format Issue" with this simple test:

data _null_;
do x = -778998059, 3990158000;
     put x :best6.;
     put x :best7.;
     put x :best8.;
     put x :best9.;
     put x :best10.;
     put x :best11.;
     put x :best12.;
     end;
run;

-779E6
-7.79E8
-7.79E8
-7.79E8
-778998059
-778998059
-778998059
3.99E9
3.99E9
3.9902E9
3.99016E9
3990158000
3990158000
3990158000

I'm using version 9.3

PG

PG
Reeza
Super User

I assume there's actually a comma or european comma format on the data, because of the periods rather than commas?

Reeza
Super User

Time for Tech Support Smiley Happy

See the very fine print at the bottom of the page, I suggest submitting a ticket.

Csands
Calcite | Level 5

Reeza , I think you are right. It's time for SAS tech support. Smiley Happy

I just posted the question here, because I notice some users with great knowledge of SAS, and probably someone already had the same issue, it turns out that I am the first Smiley Happy .

Thanks all for your time.

(By the way my version is 9.3)

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 4584 views
  • 0 likes
  • 4 in conversation