BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ADoering
Fluorite | Level 6

Hi all,

 

When calculating quantiles with Tweedie distribution for confidence intervals, I very often get errors in the case of lower boundaries (2.5% quantile) like in the subject line, where upper boundary (97.5% quantile) in fact works without problem.

 

See the following log file snippet to reproduce - first statements results in error, second works fine (same distribution, different prob): 

...

26 %put SAS Version: &sysvlong;
SAS Version: 9.04.01M5P091317
27 %put Tweedie CI Lower: %sysfunc(QUANTILE(TWEEDIE,0.025,1.3080865018,332762.58,245.3143634));
WARNING: An argument to the function QUANTILE referenced by the %SYSFUNC or %QSYSFUNC macro function is out of range.
NOTE: Mathematical operations could not be performed during %SYSFUNC function execution. The result of the operations have been set 
to a missing value.
Tweedie CI Lower: .
28 %put Tweedie CI Upper: %sysfunc(QUANTILE(TWEEDIE,0.975,1.3080865018,332762.58,245.3143634));
Tweedie CI Upper: 465720.907332802
...

 

I see no technical restrictions in the docs, and theoretically, also the first statement should work. Compare with R where the same 2.5% quantile computes just fine:

...

> library("tweedie")

Warning message:

package ‘tweedie’ was built under R version 3.5.1

> qtweedie(p=0.025,xi=1.3080865018,mu=332762.58,phi=245.3143634)

[1] 215116.9

> qtweedie(p=0.975,xi=1.3080865018,mu=332762.58,phi=245.3143634)

[1] 465720.9

...

 

Can anyone maybe suggest a workaround in SAS?

 

Thanks and regards

Andreas

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I suspect that the large values of mu and phi are leading to numerical instabilities during the quantile computation. I am not an expert in the Tweedie distribution, but you can try rescaling the parameters. The following DATA step gives the same answers as your R program, so empirically it seems to be a good workaround.

 

data Q;
xi = 1.3080865018;
mu = 332762.58;
phi = 245.3143634;
c = 100; /* scaling factor */
do prob = 0.01 to 0.5 by 0.005;
   x = QUANTILE('TWEEDIE', prob, xi, mu, phi);
   xScaled = c * QUANTILE('TWEEDIE', prob, xi, mu/c, phi/c**(2-xi));
   relDiff = (x - xScaled) / xScaled;
   output;
end;
label diff = "x - xScaled";
label relDiff = "(x - xScaled)/xScaled";
run;

/* scaled call produces quantiles for small probabilities */
proc print data=Q;
where x=.;
var prob x xScaled;
run;

/* when both methods produce answers, the relative difference is tiny */
proc means data=Q;
var relDiff;
run;

View solution in original post

6 REPLIES 6
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post your actual code.  Also, why use %sysfunc()?  SAS works on datasets, macro language is there for generating text only.  Don't confuse the two.  WHat happens when you call the function in a datastep, with proper attributes and such like?  E.g:

data want;
  want=quantile("Tweedie",0.25,1.3080865018,332762.58,245.3143634);
run;

This works, not sure what your trying to achieve, but isn't 0.025 a bit small?

ADoering
Fluorite | Level 6

@RW9As I said, I want to calculate Confidence Intervalls, so I need 0.025, not 0.25, even if it seems small to you... 🙂 If you prefer data step, I'll post what happens if I put 0.025 in your data step, makes no difference obviously:

...

27 data want;
28 want=quantile("Tweedie",0.025,1.3080865018,332762.58,245.3143634);
29 run;

NOTE: Compression was disabled for data set WORK.WANT because compression overhead would increase the size of the data set.
NOTE: Invalid argument to function QUANTILE('Tweedie',0.025,1.3080865018,332762.58,245.3143634) at line 28 column 8.
want=. _ERROR_=1 _N_=1
NOTE: Mathematical operations could not be performed at the following places. The results of the operations have been set to
missing values.
Each place is given by: (Number of times) at (Line):(Column).
1 at 28:8

...

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Confidence intervals can mean anything, we normally use 95% for instance which is alpha=0.5.

Anyways, not much more I can do for you, simply put the tweedie model stops working at 0.56 - you can try it, works with 0.56 for instance.  I am not a stato so I can't tell you why.  I will move this into the statistical section, perhaps one of them can explain the model.  

Rick_SAS
SAS Super FREQ

I suspect that the large values of mu and phi are leading to numerical instabilities during the quantile computation. I am not an expert in the Tweedie distribution, but you can try rescaling the parameters. The following DATA step gives the same answers as your R program, so empirically it seems to be a good workaround.

 

data Q;
xi = 1.3080865018;
mu = 332762.58;
phi = 245.3143634;
c = 100; /* scaling factor */
do prob = 0.01 to 0.5 by 0.005;
   x = QUANTILE('TWEEDIE', prob, xi, mu, phi);
   xScaled = c * QUANTILE('TWEEDIE', prob, xi, mu/c, phi/c**(2-xi));
   relDiff = (x - xScaled) / xScaled;
   output;
end;
label diff = "x - xScaled";
label relDiff = "(x - xScaled)/xScaled";
run;

/* scaled call produces quantiles for small probabilities */
proc print data=Q;
where x=.;
var prob x xScaled;
run;

/* when both methods produce answers, the relative difference is tiny */
proc means data=Q;
var relDiff;
run;
ADoering
Fluorite | Level 6

Great work, thanks. That was exactly what I was looking for. Saves my day. 

 

Andreas

 

 

Rick_SAS
SAS Super FREQ

Glad to hear! I also contacted the person that supports the Tweedie distribution, He has made changes so that this workaround will not be necessary in the next release of SAS. Unfortunately, SAS 940M6 shipped last week, but the fix should be in the 2019 release.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1357 views
  • 0 likes
  • 3 in conversation