Dear Team,
Some ETLs are failed in SAS DI due to communication link failure as per the log and some other jobs are executing in a long running process like 1 Lakhs records taking 10 minutes or above to complete.
We have checked the Ping and Telnet in SAS Env. and DB Connection also which is SQL Server, all are connecting successfully and also checked the I/O wait time, it's 2.1 and reduce to 0. still we are facing same issue like Communication link failure.
Please suggest how to resolve this problem ?
I have attached some screenshots for your references.
Thanks & Regards
Shakti Sourav Mohapatra
This has nothing to do with DIS and likely not even with SAS but with the communication of the ODBC driver with the SQL server.
Just use the exact error message to Google possible causes and remedies. Also ensure that the ODBC driver used meets both the SQL Server and SAS requirements.
Asking chatGPT what the possible root cause for intermittent communication link failures could be returned below answer.
One of the usual questions: Did this ever work in the past without issues? And if yes: What has changed?
I'd be looking first into the network, concurrency/resource contention and timeouts (like: long running process on the DB).
Looking at the time when such errors occur and how many SAS and other processes connecting to the database exist at that time could help narrowing down the root cause.
I believe if all available connection to a DB are consumed the error would be different (connection refused) but it's eventually still worth to check and potentially increase the number of allowed connections. You also need certainly to talk to your DBA to get the info if there has been a connection attempt to the DB or if the communication got interrupted earlier - and how busy the DB was at the time when you've run into the error.
Also consider to contact SAS Tech Support directly and get their guidance.
Below the chatGPT answer:
If the communication link failure occurs intermittently, it can be more challenging to pinpoint the exact root cause since it suggests that the issue is not consistently reproducible. However, here are some possible root causes for intermittent communication link failures between a client and a database:
Network Fluctuations: Variations in network conditions such as intermittent packet loss, congestion, or bandwidth fluctuations can cause occasional communication failures.
Transient Hardware Issues: Temporary hardware malfunctions such as a loose cable, overheating components, or failing network equipment may intermittently disrupt communication.
Software Glitches: Transient software issues such as memory leaks, race conditions, or intermittent bugs in the client application or the database management system can cause intermittent communication failures.
Load Balancing or Failover Issues: If load balancing or failover mechanisms are in place, occasional misconfigurations or failures in these systems can lead to intermittent communication issues.
Concurrency Problems: In multi-threaded or concurrent environments, occasional race conditions or synchronization issues may lead to intermittent communication failures.
Resource Contentions: Intermittent contention for resources such as CPU, memory, or disk I/O between the client and the database server can cause intermittent communication failures, especially during periods of high load.
Environmental Factors: External factors such as electromagnetic interference, power fluctuations, or environmental conditions (e.g., temperature, humidity) can intermittently disrupt communication.
Intermittent Software Updates: If software updates or patches are applied periodically, intermittent compatibility issues between different versions of software components may cause occasional communication failures.
Transient Security Issues: Occasionally, security mechanisms such as firewalls, intrusion detection systems, or access control lists may intermittently block or interfere with communication between the client and the database server.
Temporary DNS Resolution Problems: Intermittent DNS resolution issues, such as DNS server outages or transient network latency, can occasionally prevent the client from resolving the database server's hostname.
Intermittent Client-Side Issues: Issues specific to the client environment, such as sporadic software conflicts, background processes consuming resources, or intermittent network connectivity on the client side, can cause intermittent communication failures.
Identifying the root cause of intermittent communication link failures may require thorough monitoring, logging, and diagnostic efforts to capture the issue when it occurs and analyze relevant system and network parameters.
Please do ALWAYS post logs as text, using the appropriate window; it saves you the hassle of creating and uploading a screenshot, and makes it much easier to read for us. On top of it, we can easily make annotations.
Now, less than 20 seconds of CPU time took 1026 seconds of real time (a factor of 50!), so there's most probably a network bottleneck involved. During that time, 819175 (close to 1 million, which translates to at least 8 "lakh") observations were processed, so less than 1000/sec. What is your observation size, which variables do you have?
Do you have the same rate of CPU time vs. real time in successful jobs?
Also check with your DB admins if they can find something in their server logs.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.