Hi Experts,
We have a batch process that executes every day. This week, a job that usually does not past 18 minutes of execution time (real time, as you can see), now is taking more than 45 minutes to finish.
We already have Fullstimmer option active, but we don't know why only the real time was increased and why only this week.
It might be an I/O issue. Does anyone know how we can identify if it is really an I/O problem or if it could be some other issue (network, for example)?
In old documentation there are Fullstimmer stats that could help to identify the problem but they do not appear in log. The stats are those down below, as you can see:
Please, if anyone has faced this problem before or knows what is missing, please, let us know. It will be more than welcome.
Thanks in advance.
What does the rest of the log look like for that job? Are the number of observations processed in each step roughly the same as before? If your long run was using 4 times the data then expect longer run times.
Look at each procedures summary. While the basic output does not show all those details you could still see if one or more steps is taking longer or typically the number of observations processed.
Any changes in network traffic on your system?
Size of output change?
Hi @ballardw ,
To be more specific, this is one of the queries that have increased in time dramatically. As you can see, it is reading from a data base (SQL Server, VAULT schema) and work and writing in work directory.
Number of observations its almost the same:
We asked customer about any change in network traffic, and they said still the same.
It's hard to tell from the logs, but the code you showed is using implicit pass-through PROC SQL. That always scares me, because SAS decides whether to let the database do all the work of the query, or whether to pull all the data down to SAS and have SAS do the work. So this might be a case where SAS has decided to pull all the data down to SAS, and that can substantially increasing job time. From the log, it doesn't look like that is happening, but just a thought. I generally feel like coding explicit pass-through is safer, and also makes it easier to run the exact query from SAS and SSMS or whatever query tool.
I would investigate the remote database.
The data being queried could have grown. Data updates could have unbalanced the data.
There might be new users access the same database competing with your query.
Get the DBA for the remote database to investigate. It might be harder to find the information on the previous runs for comparisons. But they might be able to make some optimizations the database that could improve things without making any code changes on your end. Or they could might suggest a better query you could use.
Hello @Egrodrigues2014
As one can see clearly from the logs that stats are identical except for real time.
The two probable causes would be (1) network congestion (2) some thing happening at the database level.
If this has happened only in one instance than there would be some issue in the above two areas then better to ignore. This is like an accident causing traffic delay.
If it is happening on a regular basis then the SAS team should work with network and DBA teams to identify the causes. Possibly there could be a new process running at the same time for example a backup or ETL process.
The issue brings to fore and often reiterated statement "processing should take place where the data resides"
Agree with @ballardw .
I would start by looking for any individual steps that have increased in time dramatically.
You didn't say much about what this job is doing. Is it reading or writing data from a database? Then you would want to look into potential database changes, or changes to the network between you and the database.
Is it reading / writing a bunch of files stored on a network far away from the SAS job?
My understanding is if real time increased dramatically without CPU time increasing, it suggests SAS is waiting for something. Meaning it's waiting for I/O, network, database, etc.
Examine your complete SAS log, compare it with last week's one and locate the step(s) that are much slower. If, for example, you are reading an external database and that is the only step that is slower then you have found your reason. If all steps are slower then it could be you are getting much poorer IO performance. Typically the bigger the difference between real time and CPU time means your job is IO constrained.
Hello @Egrodrigues2014
Disk I/O would impact every run. The difference in the real time between the two runs indicates that this is not the likely cause.
Often when a program pulls data from an external database then either due to network congestion or some other changes at the database level could cause such a difference in real times between the two runs. I have observed this happening in practice.
In the present case, I suggest that the entire log be reviewed and the 'rate limiting step(s)' (steps requiring the large run times) need to be identified. This will give an idea of what is causing the issue.
Hi @Sajid01 ,
We are aware that could be a due to network congestion issue. We asked customer about any change in network traffic, and they said still the same.
As it executes as batch process in production, which runs at night (approx 3:00 am), we cannot monitor it at the moment using OS tools (Windows Server 2012 R2). Is there a way to save the historical information of I/O disk and network usage during batch execution? It is a customer's server and we do not have permission to install any monitoring software.
Looks like that one step is your main culprit.
I'm not a DBA, but usually my first step would be to run the query via SSMS (ideally on with SSMS running somewhere close to the database), and see how long it takes. This should help you get a sense of whether the problem is on the database side, or the network transfer. You can play with running the query from SAS (via explicit pass through), from SSMS in the same place where your SAS session is running, and then on SSMS close to your server. Of course if you have a helpful DBA, they should be able to look into logging from the SQL server side to see if this query really is taking four times as it took last week.
But my money is on network stuff, and I guess for that you would need to have a network engineer involved.
Often I will made a little job just of one problematic query. Then I can run it during the day for testing, schedule it at different times at night, etc. Just to see if there are any differences.
When real time exceeds CPU time, there are two common reasons:
A combination of both is of course possible. Multiple SAS jobs making heavy use of WORK can cause latencies there, especially if you still use spinning metal.
So you need to use system monitor tools to see which processes are running concurrently. I used to run topas on our AIX SAS server all the time to watch its performance state.
I have encountered a similar problem before. A batch job was delayed so it collided with the daily backup of the database, and that dramatically increased download time. There could be many other reasons for your problem, but it might be a good idea to check if your long-running job was started at the same time as usual, and contact the database admin and ask when the daily backup is running, so you can schedule your job to avoid conflict.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.