I have a weird issue and have no clue at all why and how.
My SAS EG program worked well and used to be executed in about 4, 5 minutes (the whole program). The free disc space is also good.
Lately the same program takes age to finish, more than 1 hour... It appears as if the program enters in an endless loop and during this time, the work space increases insanely to finish on a sort failure (no more space).
I believe it's possibly a server problem, maybe following a windows server update.
Does any one have an idea ? A patch for SAS EG Server maybe ? (I am only an end user, not an admin just in case...)
Any ideas, suggestions or feedbacks are welcome.
Thank you
Thank you everyone for the time spent
I asked my IT to download the powershell script from https://support.sas.com/en/technical-support/maintenance/hot-fixes/hot-fix-analysis-download-deploym...
And also the result of the analysis : it seems that the SAS server is not maintened since 2015... And during this time, Windows server keeps getting new hotfix.
Some of the hotfix from SAS report shows that several kernel hotfix are need.
Problem solved now.
Maybe this solution could help if some of you have lazy admins...
Look closely at your SAS log for any evidence of changed behaviour. Focus on steps that take a lot longer. There will be reasons why your program is slower. It is a matter of looking at the evidence to discover why. In particular has the size of your datasets changed? For example a big increase in row counts could point to data joins no longer working correctly, or it could be simply that you have more input data.
Also talk to your SAS administrator. Maybe your SAS server is being heavily used and it is slowing down for everyone. Try running your program outside normal work hours. Is it still slow?
I also tried outside normal work hours, same weird issues/behaviours
The input data is really stable and is the same (because functionaly, this part of data is nearly dead, but we have to keep it just in case)
So the total of rows and data is so stable
I mean by that, all the parameters are the same, same data, same sizes, same query, and instead of 2 seconds of excution, I have more than one hour with errors
Something has changed if the run time changed.
The first thing that comes to mind reading your description is that some merge keys are no longer unique, and the join volumes are therefore increasing. Check that.
Did you keep old logs? Compare the run times and volumes to see where the time increases.
Or maybe there are more users or more programs running now?
A hardware/OS change could also be the reason, but we have no way of knowing from here.
Thank you for your reply,
Yeah I git everything, and have all the SAS log since the dawn of time.
The weird thing is the SQL query does left join with really tiny tables (the most voluminous is about 60 MB)
The (possible) non unique keys, wont lead to this weird behaviour. ASAIK the cartesian merging will produce a bigger table, it will take maybe few minutes longer. But in the end, we still get something...
In my case it keeps growing in term of size until the workspace breaks and then the proc SQL fails.
I ask my IT to send to me the whole list of Windows server hotfix...
Do you know if there is somewhere that I can check the all list of hotfix on SAS please?
Thank you again
So you have already narrowed it down to a single step? If yes, then please post the complete log of that step as mentioned above.
I dit not narrow the query down.
It is only an historic program that I run every trimester
As you can see, the same query 3 months ago works well and takes only 2s and now it is extremely slow with errors
The _CNC_TR1 table size is 60MB, the others is respectly 20MB, 1MB, and 16MB
Either your configuration was severely modified (such as the amount of RAM available), or your data changed drastically.
Run a DQ check on the 5 input tables. You could actually make that validation step a part of the production program.
If some source data gets corrupted, and for example some keys end up missing or trimmed, this is exactly the kind of thing that happens.
Thank you
Sadly I have tried everything and used all levers I know.
And most of time I come here because I have no more ideas, and by sharings infos, maybe... =D
For now, I am running out of ideas to fix/analyse this weird issue.
This is an example for what caused me to write Maxim 34 (Work in Steps).
Steps like this work fine, until a slight change in data causes a disaster, and then you have to spend hours diagnosing it, as you don't have any intermediate result where you can pin down the problem. And the SQL log is notoriously unhelpful, as it only tells you "I crashed" and nothing else. And in the case of success it only tells you the end result, not how many items were read from the contributing tables and other helpful details.
So you need to look at all the relationships between the tables; have a good look if one of the variables used for joining has been set to all missing (or the same value on all or most observations), causing a monstrous cartesian join of everything * everything.
Rebuild the joins one-by-one, starting with the t2 join; once you know how many observations that gives you, add t3, then t4, then t5.
Seeing that you do multiple lookups, you should seriously consider replacing that step with a data step and using hash objects for the lookups, and do the filtering for the min in a separate step.
Thank you I will try it.
I did not try this till now because the deadline is in 3 days and we discoverd this issue yesterday. It's too short to change/test etc...
Now you know why you get paid the big bucks 😉
But the bigger issue here is a hard-to-maintain, quite complicated step; maintainablilty is the hallmark of good code, and this step violates that principle big time.
Working proactively to defuse such disasters-waiting-to-happen has been much of my work in the past; the result are mostly simple codes that make it easy to detect problems and fix them, otherwise I would never have been able to play shepherd for ~1000 SAS batch jobs.
@k4minou wrote:
Do you know if there is somewhere that I can check the all list of hotfix on SAS please?
Yes, there is (in the configuration directory tree of the SAS server, so your SAS admin can check that), but I have yet to experience a SAS hotfix that causes such a massive slowdown (in a very central part of the SAS system, at that).
But I have seen (literally) hundreds, if not thousands of SQL steps where a little design mistake had disastrous consequences in terms of performance. We really need to see that log to be of help here.
I mean on the SAS site, is there a list of windows server hotfix with SAS hotfix to fix the Windows hotfix ?
Just for the context, from what I know, the IT applies Windows server hotfix every 3 months.
They do not check all the possible consequence for the others servers installed and ran on the Windows server.
I am aware of this part lately only, so maybe with all the cumulatives Windows server hotfix, the SAS server could acts weirdly now?
I asked the IT admin to restart the SAS server and it does not help at all : same weird slow query with error in the end...
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.