BookmarkSubscribeRSS Feed
DocMartin
Quartz | Level 8
I've got an algorithm with around 300 parameters that I'm trying to optimize using PROC OPTLSO. When I run the optimization using 1,990 observations the routine takes 10 seconds to complete. Yet when I add observation #1991 the routines takes 9 minutes to finish. I tried two things: 1. Using a different data set, and trying it out with 1991 observations. No problem. So it's not that the optimization became saturated. 2. Looked at observation #1991 to see if it differed substantially from the others. The answer was NO. Have any of you experienced this kind of problem? If so, how did you fix it. Thanks! Andrew
5 REPLIES 5
RobPratt
SAS Super FREQ

Can you please share your code and data?

DocMartin
Quartz | Level 8
Because the data are hospitalized patients, I can't share it.
ballardw
Super User

@DocMartin wrote:
. 2. Looked at observation #1991 to see if it differed substantially from the others.

Depending on how you look at this record and compare with the remaining what you see may not be what the algorithms see.

If you have a variable with a format assigned with the other option in proc format it may be that a view in a table or proc print does not show an extreme value.

proc format library=work;
value extreme
0-100 = [f5.2]
other = 100;
run;

data example;
   do x= 1,10,99.4, 10e18;
   output;
   end;
   format x extreme.;
run;

proc print data=example noobs;
run;

Or perhaps no single value is extreme but you have added a combination of values that might be considered an outlier that requiring many more iterations to solve.

 

 

I would suggest splitting the data into two pieces without about half the records in each and see if the reduced set with the problem record continues to be a problem. I suspect it may. Were there any interesting items in the log for the longer run that don't appear in the shorter?

 

Without seeing code to know how many variables from your set are being used to  create the parameters it is hard to see if a specific minimum sized data set would be needed.

DocMartin
Quartz | Level 8

Thanks! But I'm not using any formatted variables. I've split my data set numerous ways, and each time there seems to be at least one observation that's causing it to hang. I'll check again to make sure the values that are being used are what I expected them to be.

RobPratt
SAS Super FREQ
Can you anonymize the data? Or at least share the code and logs?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1064 views
  • 0 likes
  • 3 in conversation