BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
brunoramosmarti
Fluorite | Level 6

Hello, everyone! I hope you're all doing well. I'm currently facing a challenge in SAS to determine the best distribution curve for a specific set of data. My goal is to create a program that tests various curves and calculates the performance of each, choosing the most suitable one. I've done a similar task in Python using the FITTER library, which tests about 80 different curves and returns the top three with the best fit.
Is there an equivalent way to do this in SAS, similar to the FITTER library?

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Yes. I remember  @Rick_SAS     answered this kind of question before in this forum.

https://communities.sas.com/t5/Statistical-Procedures/Proc-univariate-severity/m-p/296574

https://blogs.sas.com/content/iml/2020/02/12/which-johnson-distribution-su-sb.html

https://blogs.sas.com/content/iml/2011/10/28/modeling-the-distribution-of-data-create-a-qq-plot.html

 

 

And you could use PROC GENMODE or PROC SEVERITY to do the same thing Like:

proc genmod;

model y= /dist=lognormal;

run;

proc genmod;

model y= /dist=normal;

run;

Ksharp_0-1712888288797.png

 

proc severity data=sashelp.heart;
dist _all_;
loss weight;
run;

Ksharp_0-1712889108233.png

 

 

 

 

And you could use MetaLog distribution to fit any free distribution. Check @Rick_SAS    blogs:

https://blogs.sas.com/content/iml/2023/02/22/metalog-distribution.html

https://blogs.sas.com/content/iml/2023/03/13/metalog-sas.html

https://blogs.sas.com/content/iml/2023/03/15/distribution-expert-opinion.html

 

View solution in original post

10 REPLIES 10
PaigeMiller
Diamond | Level 26

PROC UNIVARIATE can fit lots of different distributions in one run.

 

For more general distributions fitting, see https://support.sas.com/kb/23/135.html

--
Paige Miller
Ksharp
Super User

Yes. I remember  @Rick_SAS     answered this kind of question before in this forum.

https://communities.sas.com/t5/Statistical-Procedures/Proc-univariate-severity/m-p/296574

https://blogs.sas.com/content/iml/2020/02/12/which-johnson-distribution-su-sb.html

https://blogs.sas.com/content/iml/2011/10/28/modeling-the-distribution-of-data-create-a-qq-plot.html

 

 

And you could use PROC GENMODE or PROC SEVERITY to do the same thing Like:

proc genmod;

model y= /dist=lognormal;

run;

proc genmod;

model y= /dist=normal;

run;

Ksharp_0-1712888288797.png

 

proc severity data=sashelp.heart;
dist _all_;
loss weight;
run;

Ksharp_0-1712889108233.png

 

 

 

 

And you could use MetaLog distribution to fit any free distribution. Check @Rick_SAS    blogs:

https://blogs.sas.com/content/iml/2023/02/22/metalog-distribution.html

https://blogs.sas.com/content/iml/2023/03/13/metalog-sas.html

https://blogs.sas.com/content/iml/2023/03/15/distribution-expert-opinion.html

 

brunoramosmarti
Fluorite | Level 6

Hello everyone, Thank you for the quick response.
I tested the code SEVERITY, but it took too long and I ended up canceling it. Upon checking the documentation, I found an alternative called HPSEVERITY. This code worked to a certain extent, but two errors occurred: one regarding Gamma conversion and another related to Java. Do you have any suggestions to avoid these errors?

 

My code:

proc HPSEVERITY data=WORK.TEMP
                 outest=WORK.myests
                 criteria=KS;
    dist _ALL_;
    loss y;
run;

 LOG:

72         
73         ods graphics on;
74         
75         proc HPSEVERITY data=WORK.TEMP
76                          outest=WORK.myests
77                          criteria=KS;
78             dist _ALL_;
79             loss comprometido_2;
80         run;

NOTE: The HPSEVERITY procedure is executing in single-machine mode.
NOTE: A finite difference approximation is used for the derivative of the 'LOGPDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGCDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGSDF' function.
3                                                          The SAS System                               10:59 Friday, April 12, 2024

NOTE: A finite difference approximation is used for the derivative of the 'LOGPDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGCDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGSDF' function.
ERROR: Convergence Status for Gamma: Did not converge.
WARNING: Convergence Status for Burr: Exceeded limit on iterations.
WARNING: This graph has too many graphical elements. You may not be able to get any vector graphics output and in that case, you 
         can set your output format to an image type.
ERROR: Java virtual machine exception. java.lang.OutOfMemoryError: Java heap space.

I wasn't familiar with GENMOD, so I will start reading the documentation now and check if it is possible to apply it to my study.

PaigeMiller
Diamond | Level 26

@brunoramosmarti 

If you get errors from running a PROC, you need to show us the log for that PROC — and we need to see every line in the log of that PROC, including code, errors, warnings and notes.

--
Paige Miller
brunoramosmarti
Fluorite | Level 6

 

Good morning, everyone! Sorry for not sending the error log. Here is the information below. I will also update the previous post by adding this information.

72         
73         ods graphics on;
74         
75         proc HPSEVERITY data=WORK.TEMP
76                          outest=WORK.myests
77                          criteria=KS;
78             dist _ALL_;
79             loss comprometido_2;
80         run;

NOTE: The HPSEVERITY procedure is executing in single-machine mode.
NOTE: A finite difference approximation is used for the derivative of the 'LOGPDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGCDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGSDF' function.
3                                                          The SAS System                               10:59 Friday, April 12, 2024

NOTE: A finite difference approximation is used for the derivative of the 'LOGPDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGCDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGSDF' function.
ERROR: Convergence Status for Gamma: Did not converge.
WARNING: Convergence Status for Burr: Exceeded limit on iterations.
WARNING: This graph has too many graphical elements. You may not be able to get any vector graphics output and in that case, you 
         can set your output format to an image type.
ERROR: Java virtual machine exception. java.lang.OutOfMemoryError: Java heap space.

 

 

sbxkoenk
SAS Super FREQ

Hello,

 

What SAS version and what operating system are you using?

Submit this :

%PUT &=sysvlong4;
%PUT &=SYSSCP;
%PUT &=SYSSCPL;

... and tell us what it says in the LOG-screen.

 

What is the size of the data table (especially --> how many rows?)?

 

With respect to PROC HPSEVERITY not converging when fitting a Gamma distribution ...
Here are some remedial measures you could try :

  • increase the maximum number of iterations via the MAXITER= option in the NLOPTIONS statement
  • try different optimization routines, other than the default trust region (TRUREG) optimizer
  • specify starting values for the distribution parameters via either the INEST= option in the PROC HPSEVERITY statement or the INIT= option in the DIST statement

Koen

brunoramosmarti
Fluorite | Level 6

Following information about SAS and Operating System.

 

Sas Version:

28         %PUT &=sysvlong4;
SYSVLONG4=9.04.01M4P11092016
29         %PUT &=&SYSSCP;
&=WIN
30         %PUT &=SYSSCPL;
SYSSCPL=X64_SR12R2

I managed to solve the Java error by removing the graphics. The convergence issue is also working now, as I selected some distributions instead of trying them all. Thank you for your help, everyone!

brunoramosmarti
Fluorite | Level 6

Following information about SAS and Operating System.

 

%PUT &=sysvlong4;
SYSVLONG4=9.04.01M4P11092016
%PUT &=&SYSSCP;
&=WIN
%PUT &=SYSSCPL;
SYSSCPL=X64_SR12R2

I managed to solve the Java error by removing the graphics. The convergence issue is also working now, as I selected some distributions instead of trying them all. Thank you for your help, everyone!

Ksharp
Super User
About Java error, I guess it is due to you have a big table.
And contact with sas support to see if they have better idea about this topic.

And Did you check the url posted by Paige ?
https://support.sas.com/kb/23/135.html
brunoramosmarti
Fluorite | Level 6

If the Java error is related to the graph and the volume generates an error, then it's not a problem. This is because I am only seeking the optimal curve and do not require a view of the distribution. I left this option enabled because I came across an example in my research that activated the following parameter:

 

ods graphics on; 

Regarding Paige's post, I also checked it and found it to be very useful.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 732 views
  • 6 likes
  • 4 in conversation