Hi,
I did a proc univariate for a variable, and got in the extreme value tables and in the quantiles table different values for the maximum and minimum of the variable. As shown below.
Extreme Values | |||||
Lowest | Highest | ||||
Order | Value | Freq | Order | Value | Freq |
1 | -778.998.059 | 1 | . | . | . |
. . . . | . . . . | . . . . | . . . . | . . . . | . . . . |
. | . | . | 270756 | 3.990.158.000 | 1 |
Quantiles | |
Quantile | Estimate |
100% Max | 3.990.160.000 |
99% | 1.190.450 |
95% | 210.926 |
90% | 97.597 |
75% Q3 | 29.182 |
50% Median | 6.771 |
25% Q1 | -2.523 |
10% | -17.403 |
5% | -36.202 |
1% | -174.758 |
0% Min | -778.998.000 |
Why are this values different?
Thanks in advance for the help.
That's a very good question. How many observations are in your dataset? Did you try playing with UNIVARIATE option PCTLDEF= ?
Given that FREQ=1 for each of them would it matter?
Which ones correct by the way?
I would bet the extreme values are exact. They are easy to get in a single pass of the dataset. Not the same for quantile estimates (other than Min and Max). They look like rounded values that most likely result from some binning algorithm. - PG
I'm willing to bet that the difference is the default table format for the two tables, probably a difference between best9 or best 10 and best 12.
Well, that hypothesis could be checked by exporting the tables to ODS datasets. - PG
So....
The code I use is very simple.
PROC UNIVARIATE DATA = table NEXTRVAL=30;
BY byvar;
VAR var_to_study;
RUN;
I run this code for more than 30 continuous variables (as the var are continuos is ok to have a freq=1). For almost all the variables with values of more than 6 digits I have this issue. My variables have no formats, just a length 8 associated. The dataset I use as more than 500.000 rows, the results I showed before belong to a by group of more than 200.000 rows, and I can have missing values.
I already calculated a max and min with proc sql, and the results where the same as the extreme values table.
Of course I have several ways of getting the max and min .... but I was expecting to get the same values in all the methodologies.
What version of SAS are you on? I can't replicate the issue on SAS 9.3
I think I agree with BallardW that it's probably just a format issue with the template.
Add in this line before your code and see if the values in the dataset xvalues are what you expect:
ods table extremevalues=xvalues;
I can't reproduce the "Format Issue" with this simple test:
data _null_;
do x = -778998059, 3990158000;
put x :best6.;
put x :best7.;
put x :best8.;
put x :best9.;
put x :best10.;
put x :best11.;
put x :best12.;
end;
run;
-779E6
-7.79E8
-7.79E8
-7.79E8
-778998059
-778998059
-778998059
3.99E9
3.99E9
3.9902E9
3.99016E9
3990158000
3990158000
3990158000
I'm using version 9.3
PG
I assume there's actually a comma or european comma format on the data, because of the periods rather than commas?
Time for Tech Support
See the very fine print at the bottom of the page, I suggest submitting a ticket.
Reeza , I think you are right. It's time for SAS tech support.
I just posted the question here, because I notice some users with great knowledge of SAS, and probably someone already had the same issue, it turns out that I am the first .
Thanks all for your time.
(By the way my version is 9.3)
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.