Hello, everyone! I hope you're all doing well. I'm currently facing a challenge in SAS to determine the best distribution curve for a specific set of data. My goal is to create a program that tests various curves and calculates the performance of each, choosing the most suitable one. I've done a similar task in Python using the FITTER library, which tests about 80 different curves and returns the top three with the best fit.
Is there an equivalent way to do this in SAS, similar to the FITTER library?
Yes. I remember @Rick_SAS answered this kind of question before in this forum.
https://communities.sas.com/t5/Statistical-Procedures/Proc-univariate-severity/m-p/296574
https://blogs.sas.com/content/iml/2020/02/12/which-johnson-distribution-su-sb.html
https://blogs.sas.com/content/iml/2011/10/28/modeling-the-distribution-of-data-create-a-qq-plot.html
And you could use PROC GENMODE or PROC SEVERITY to do the same thing Like:
proc genmod;
model y= /dist=lognormal;
run;
proc genmod;
model y= /dist=normal;
run;
proc severity data=sashelp.heart;
dist _all_;
loss weight;
run;
And you could use MetaLog distribution to fit any free distribution. Check @Rick_SAS blogs:
https://blogs.sas.com/content/iml/2023/02/22/metalog-distribution.html
https://blogs.sas.com/content/iml/2023/03/13/metalog-sas.html
https://blogs.sas.com/content/iml/2023/03/15/distribution-expert-opinion.html
PROC UNIVARIATE can fit lots of different distributions in one run.
For more general distributions fitting, see https://support.sas.com/kb/23/135.html
Yes. I remember @Rick_SAS answered this kind of question before in this forum.
https://communities.sas.com/t5/Statistical-Procedures/Proc-univariate-severity/m-p/296574
https://blogs.sas.com/content/iml/2020/02/12/which-johnson-distribution-su-sb.html
https://blogs.sas.com/content/iml/2011/10/28/modeling-the-distribution-of-data-create-a-qq-plot.html
And you could use PROC GENMODE or PROC SEVERITY to do the same thing Like:
proc genmod;
model y= /dist=lognormal;
run;
proc genmod;
model y= /dist=normal;
run;
proc severity data=sashelp.heart;
dist _all_;
loss weight;
run;
And you could use MetaLog distribution to fit any free distribution. Check @Rick_SAS blogs:
https://blogs.sas.com/content/iml/2023/02/22/metalog-distribution.html
https://blogs.sas.com/content/iml/2023/03/13/metalog-sas.html
https://blogs.sas.com/content/iml/2023/03/15/distribution-expert-opinion.html
Hello everyone, Thank you for the quick response.
I tested the code SEVERITY, but it took too long and I ended up canceling it. Upon checking the documentation, I found an alternative called HPSEVERITY. This code worked to a certain extent, but two errors occurred: one regarding Gamma conversion and another related to Java. Do you have any suggestions to avoid these errors?
My code:
proc HPSEVERITY data=WORK.TEMP
                 outest=WORK.myests
                 criteria=KS;
    dist _ALL_;
    loss y;
run;LOG:
72         
73         ods graphics on;
74         
75         proc HPSEVERITY data=WORK.TEMP
76                          outest=WORK.myests
77                          criteria=KS;
78             dist _ALL_;
79             loss comprometido_2;
80         run;
NOTE: The HPSEVERITY procedure is executing in single-machine mode.
NOTE: A finite difference approximation is used for the derivative of the 'LOGPDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGCDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGSDF' function.
3                                                          The SAS System                               10:59 Friday, April 12, 2024
NOTE: A finite difference approximation is used for the derivative of the 'LOGPDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGCDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGSDF' function.
ERROR: Convergence Status for Gamma: Did not converge.
WARNING: Convergence Status for Burr: Exceeded limit on iterations.
WARNING: This graph has too many graphical elements. You may not be able to get any vector graphics output and in that case, you 
         can set your output format to an image type.
ERROR: Java virtual machine exception. java.lang.OutOfMemoryError: Java heap space.I wasn't familiar with GENMOD, so I will start reading the documentation now and check if it is possible to apply it to my study.
If you get errors from running a PROC, you need to show us the log for that PROC — and we need to see every line in the log of that PROC, including code, errors, warnings and notes.
Good morning, everyone! Sorry for not sending the error log. Here is the information below. I will also update the previous post by adding this information.
72         
73         ods graphics on;
74         
75         proc HPSEVERITY data=WORK.TEMP
76                          outest=WORK.myests
77                          criteria=KS;
78             dist _ALL_;
79             loss comprometido_2;
80         run;
NOTE: The HPSEVERITY procedure is executing in single-machine mode.
NOTE: A finite difference approximation is used for the derivative of the 'LOGPDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGCDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGSDF' function.
3                                                          The SAS System                               10:59 Friday, April 12, 2024
NOTE: A finite difference approximation is used for the derivative of the 'LOGPDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGCDF' function.
NOTE: A finite difference approximation is used for the derivative of the 'LOGSDF' function.
ERROR: Convergence Status for Gamma: Did not converge.
WARNING: Convergence Status for Burr: Exceeded limit on iterations.
WARNING: This graph has too many graphical elements. You may not be able to get any vector graphics output and in that case, you 
         can set your output format to an image type.
ERROR: Java virtual machine exception. java.lang.OutOfMemoryError: Java heap space.
Hello,
What SAS version and what operating system are you using?
Submit this :
%PUT &=sysvlong4;
%PUT &=SYSSCP;
%PUT &=SYSSCPL;... and tell us what it says in the LOG-screen.
What is the size of the data table (especially --> how many rows?)?
With respect to PROC HPSEVERITY not converging when fitting a Gamma distribution ...
Here are some remedial measures you could try :
Koen
Following information about SAS and Operating System.
Sas Version:
28 %PUT &=sysvlong4; SYSVLONG4=9.04.01M4P11092016 29 %PUT &=&SYSSCP; &=WIN 30 %PUT &=SYSSCPL; SYSSCPL=X64_SR12R2
I managed to solve the Java error by removing the graphics. The convergence issue is also working now, as I selected some distributions instead of trying them all. Thank you for your help, everyone!
Following information about SAS and Operating System.
%PUT &=sysvlong4; SYSVLONG4=9.04.01M4P11092016 %PUT &=&SYSSCP; &=WIN %PUT &=SYSSCPL; SYSSCPL=X64_SR12R2
I managed to solve the Java error by removing the graphics. The convergence issue is also working now, as I selected some distributions instead of trying them all. Thank you for your help, everyone!
If the Java error is related to the graph and the volume generates an error, then it's not a problem. This is because I am only seeking the optimal curve and do not require a view of the distribution. I left this option enabled because I came across an example in my research that activated the following parameter:
ods graphics on;
Regarding Paige's post, I also checked it and found it to be very useful.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.
