Administration and Deployment

Installing and maintaining your SAS environment
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
drhorg
Fluorite | Level 6

We have installed Viya 4 Kubernetes, and deployed OpenSearch Dashboards for logging . I see the error

Logsource: sas-run-tool 

message : Tool executor failed starting service: exit status 1

Level : Error 
 
then right after
 

Level : Fatal 

Logsource: sas-run-tool 

message : Service executor failed to execute successfully: exit status 1

Level : Fatal 

Did it happen in your environment and you know what this is? I googled but did not find much for this error. 

 

Thanks a lot,

1 ACCEPTED SOLUTION

Accepted Solutions
gwootton
SAS Super FREQ
The sas-backup-purge-job I think runs daily and removes old backups, this message suggests one such execution of a job failed. If you look at your job history do you see these show as completed?
kubectl -n namespace get jobs -l app.kubernetes.io/name=sas-backup-purge-job
--
Greg Wootton | Principal Systems Technical Support Engineer

View solution in original post

19 REPLIES 19
gwootton
SAS Super FREQ
What is the name of the pod that is throwing these messages?
--
Greg Wootton | Principal Systems Technical Support Engineer
drhorg
Fluorite | Level 6

Hi @gwootton , sorry for delay, missed your reply. 

 

kube.pod - sas-backup-purge-job

level -fatal

kube.container - sas-backup-purge-job

kube.annotations.sas_com/component-name
sas-backup-job

 

Please let me know if any other info needed. 

thank you!

gwootton
SAS Super FREQ
The sas-backup-purge-job I think runs daily and removes old backups, this message suggests one such execution of a job failed. If you look at your job history do you see these show as completed?
kubectl -n namespace get jobs -l app.kubernetes.io/name=sas-backup-purge-job
--
Greg Wootton | Principal Systems Technical Support Engineer
drhorg
Fluorite | Level 6

Hi @gwootton , yes the purge failed multiple times. Sorry by mistake I accepted my reply as solution instead of your reply. I fixed this now.

 

thank you for your help!

paterd2
Quartz | Level 8

Greg, 

sometimes my backup gives failed in viya 24.03.

I cannot find the reason, because if I fire this job again, it succeeds.

Maybe you have a solution ?

 

Regards,

 

Dik

 

 

{"level":"info","version":1,"source":"sas-backupjob","messageKey":"Checking if job is allowed to run.","properties":{"logger":"backupjob","caller":"backupjob/main.go:314"},"timeStamp":"2024-12-13T02:02:04.280263+01:00","message":"Checking if job is allowed to run."}
{"level":"info","version":1,"source":"sas-backupjob","messageKey":"backupjob-log-icu.updating.k8s.job.labels.info.log","messageParameters":{"jobName":"sas-scheduled-backup-dpyxjx3i-28900860"},"properties":{"logger":"backupjob/kubeutils","caller":"kubeutils/kube_client.go:126"},"timeStamp":"2024-12-13T02:02:04.334724+01:00","message":"Updating the Kubernetes job sas-scheduled-backup-dpyxjx3i-28900860 with given label."}
{"level":"info","version":1,"source":"sas-backupjob","messageKey":"backupjob-log-icu.updated.k8s.job.info.log","messageParameters":{"jobName":"sas-scheduled-backup-dpyxjx3i-28900860"},"properties":{"logger":"backupjob/kubeutils","caller":"kubeutils/kube_client.go:152"},"timeStamp":"2024-12-13T02:02:04.345526+01:00","message":"Updated the Kubernetes job the sas-scheduled-backup-dpyxjx3i-28900860."}
{"level":"error","version":1,"source":"sas-backupjob","messageKey":"Job is not allowed to run.","properties":{"logger":"backupjob","caller":"backupjob/main.go:476"},"timeStamp":"2024-12-13T02:02:04.345565+01:00","message":"Job is not allowed to run."}
{"level":"error","version":1,"source":"sas-backupjob","messageKey":"backupjob-log-icu.remote.backup.agent.process.grep.pid.error.log","messageParameters":{"error":"exit status 1"},"properties":{"logger":"backupjob","caller":"backupjob/main.go:529"},"timeStamp":"2024-12-13T02:04:14.495204+01:00","message":"backupjob-log-icu.remote.backup.agent.process.grep.pid.error.log [error:exit status 1]"}
{"level":"error","version":1,"source":"sas-run-tool","messageKey":"Tool executor failed starting service: exit status 1","properties":{"logger":"tools/run","caller":"impl/executor.go:69"},"timeStamp":"2024-12-13T02:04:14.497860+01:00","message":"Tool executor failed starting service: exit status 1"}
{"level":"fatal","version":1,"source":"sas-run-tool","messageKey":"Service executor failed to execute successfully: exit status 1","properties":{"logger":"tools/run","caller":"impl/tooling.go:139"},"timeStamp":"2024-12-13T02:04:14.497901+01:00","message":"Service executor failed to execute successfully: exit status 1"}

gwootton
SAS Super FREQ
The failure message here is "Job is not allowed to run." Usually this happens because another backup is already running.
--
Greg Wootton | Principal Systems Technical Support Engineer
paterd2
Quartz | Level 8
Greg, thank you, but I don't run 2 backups in the night, can I simutlate the check that a backup is already running ?
gwootton
SAS Super FREQ
From what I can tell keys are created in the SAS Configuration Server with the backup job id (e.g. backup-job-uuid or backup-job-incr-uuid). It checks if such a key is present and, depending on the type of backup being requested will stop if it finds one. It would also stop if it can't create such a key.
You may want to turn on debug logging of the backup, this should provide more detail on the failure. To do that, you could set the environment variable SAS_LOG_LEVEL_BACKUPJOB to debug on the cronjob, for example:
kubectl -n namespace set env cronjob/sas-scheduled-backup-job SAS_LOG_LEVEL_BACKUPJOB=DEBUG
--
Greg Wootton | Principal Systems Technical Support Engineer
paterd2
Quartz | Level 8

Greg, thank you, now we have more output, we will investigate further..

paterd2
Quartz | Level 8

Greg, we found it.

the sas-backup-purge-job cannot run simultaneous with the sas-scheduled-backup-job.

So this explains why it went ok if we run it by hand in the morning.

 

We will reschedule the full backups.

 

kubectl -n $ns get cronjob | grep backup | grep False
sas-backup-purge-job 15 0 1/1 * ? False 0 8h 127d
sas-scheduled-backup-job 0 1 * * 0 False 0 4d8h 127d
sas-scheduled-backup-kbx7pj33 0 1 * * 1-5 False 0 8h 16d

Adjusted to :
sas-backup-purge-job 15 0 1/1 * ? False 0 8h 127d
sas-scheduled-backup-job 0 4 * * 0 False 0 4d8h 127d
sas-scheduled-backup-kbx7pj33 0 4 * * 1-5 False 0 8h 16d

 

Tomorrow we know it for sure.

 

 

paterd2
Quartz | Level 8

Greg,

 

the reason is that the purge job runs for hours .

The pg_dump must be dumping with -Fd , we get 469000 *.dat.gz files . The purge proces has a lot of work to do with so many files.

Is it possible to adjust this to -Fc just one bigfile, or some other solution ...

Regards

Dik

We are on 2403 lts , maybe this behavior will change in the next upgrade.

 

gwootton
SAS Super FREQ
You are correct that the backup uses the --format directory option for pgdump, this is hard-coded and not customizable.
As far as how long purge is taking to complete though, that seems high. My oldest backup's postgres dump contains 36,544 files and my last purge job took 2 minutes to complete. In the logs for the purge job do you see any indication of why it's taking so long?
--
Greg Wootton | Principal Systems Technical Support Engineer
paterd2
Quartz | Level 8

We see entries in the table pg_catalog.pg_largeobject, for example job history and report transfer data. We export the reports with the sas-viya cli tool.

These entries are exported as blob<loid>.dat.gz 
- How can I delete the job history in viya ?

- For the reports we are investigating.

paterd2
Quartz | Level 8

Greg, more information : 

  • I made a sas studio job ; 

cas;
%put "ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ" ;

run ;

  • Logged into postgres pod : kn exec sas-crunchy-platform-postgres-00-b56g-0 -it -- sh

psql 

\c SharedServices

SharedServices=# select loid from pg_catalog.pg_largeobject where data like '%ZZZZZ%' ;
loid
---------
6612731
6612732
6613030
6613031
6613039
6613038
6765563
(7 rows)
What is in the data column ?

select loid,substr(encode(data, 'escape'),1100) as data from pg_catalog.pg_largeobject where data like '%ZZZZZ%' limit 1;

loid | data
---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
6612731 | kers."},{"version":1,"type":"source","line":"593 %put \\"ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ\\" ;"},{"version":1,"type":"normal","line":"\\"ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ\\""},{"version":1,"type":"source","line":"594 "},{"version":1,"type":"source","line":"595 run ;"},{"version":1,"type":"source","line":"596 %JESEND"},{"version":1,"type":"note","line":""}],"count":674,"limit":674,"links":[]}
(1 row)

 

 

 
  • Login the backup pod OR another pod which has the backup directory mounted  and goto the directory where the pgdump file is and zcat the file blob_<loid>.dat.gz
    /sasviyabackup/20250127-010206F/__default__/postgres/SharedServices_pg_dump.dmp $ zcat blob_6612731.dat.gz | tail -40 | fold -80 | tail -10
    The user is <d.pater> and the active caslib is CASUSER(<d.pater>)
    ."},{"version":1,"type":"note","line":"NOTE: The SAS option SESSREF was updated
    with the value CASAUTO."},{"version":1,"type":"note","line":"NOTE: The SAS macro
    _SESSREF_ was updated with the value CASAUTO."},{"version":1,"type":"note","lin
    e":"NOTE: The session is using 2 workers."},{"version":1,"type":"source","line":
    "593 %put \"ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ\" ;"},{"version":
    1,"type":"normal","line":"\"ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ\""
    },{"version":1,"type":"source","line":"594 "},{"version":1,"type":"source","lin
    e":"595 run ;"},{"version":1,"type":"source","line":"596 %JESEND"},{"version":
    1,"type":"note","line":""}],"count":674,"limit":674,"links":[]}

Questions :

1.is this normal behavior ?  

2. Must i register a case with SAS for this behavior ?

 

Regards 

Dik

 

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 19 replies
  • 4128 views
  • 3 likes
  • 3 in conversation