BookmarkSubscribeRSS Feed
caeduspl
Fluorite | Level 6

Hello Dear Community!

I am trying to create an auto-rerun mechanism by implementing some code into sasbatch script after sascommand will finish. General idea is to:
1) locate a log of sas process and an id of the flow containing current job,
2) check if the log contains particular ORA-xxxxx errors,
3) if so, then trigger jrerun class from LSF Platform Command Line Interface,
4) exit sasbatch passing $rc to LSF

The idea was implemented as:

#define used paths
log_dir=/path/to/sas_logs_directory
out_log=/path/to/auto-rerun_log.txt
out_log2=/path/to/lsf_rerun_log.txt

if [ -n "${LSB_JOBNAME}"]; then
	if [ ! -f "$out_log"]; then
		touch $out_log
	fi
	#get flow runtime attributes
	IFS-: read -r flow_id username flow_name job_name <<< "${LSB_JOBNAME}"
	
	#find log of the current process
	log_path=$(ls -t $log_dir/*.log | xargs grep -li "job:\s*$job_name" | grep -i "/$flow_name_" | head -1)
	
	#set path to txt file containing lines which represents ORA errors we look for
	conf_path-/path/to/error_list
	
	#analyse process' log line by line
	while read -r line;
	do
		#if error is found in log then try to rerun flow
		if grep -q "$line" $log_path; then
			(nohup /path/to/rerun_script.sh $flow_id >$out_log2 2>&1) &
			disown
			break
		fi
	done < $conf_path
fi

While rerun_script is the script which calls jrerun class after sleep command - in order to let parent script exit $rc in the meanwhile. It looks like:

 

sleep 10
/some/lsf/path/jrerun

Problem is that job is running for the all time. In LSF history I can see that jrerun was called before job exited.
Furthermore in $out_log2 I can see message:

<flow_id> has no starting or exit points.

Do anyone have an idea how I can pass return code to LSF before jrerun calling? Or maybe some simplier way to perform autorerun of SAS jobs in Platform LSF?

 

I am using SAS 9.4 and Platform Process Manager 9.1

Thank you for all your support,

Max.

3 REPLIES 3
SASKiwi
PROC Star

What is causing your Oracle errors in the first place? How can you be sure that a re-run won't just give you the same errors?

 

If the errors are caused by Oracle data not being available when it should be then a better approach might be to test this in your SAS program and keep looping until it is available within a certain time limit.

caeduspl
Fluorite | Level 6

Hello, thank you for your response.

 

This mechanism is designed for particular ORA errors we encounter that we know they can be resolver just by rerun. It is the mechanism just designed for them.

SASKiwi
PROC Star

If that is the case you could also check for that error in your SAS code and repeat the step. It might be easier than an LSF solution.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1352 views
  • 0 likes
  • 2 in conversation