BookmarkSubscribeRSS Feed
Balda981
Fluorite | Level 6

Good morning,

 

Updating the license, I encountered problems in starting the SAS Viya infrastructure services. I don't know if the problem is related to this anyway.

 

The steps followed to update licenses were followed by the guide found at this reference in SAS Help Center: Licensing: How To Apply Licenses.

In detail, the macro steps were as follows:

 

  1. Apply New Licenses Manually: The new license (both .jwt file and .txt file) was placed inside the /opt/sas/spre/home/SASFoundation/ folder. The command to apply configure the new license was executed obtaining as a message from the server the correct application of the license. Similarly, the license presence on CAS was updated by applying the command to create the file "sas_license.txt"
  2. Apply New SAS Data Agent Licenses: In this file /opt/sas/viya/config/etc/dagentsrv/default/setinit.txt is pasted the new txt license

Once this phase is finished proceeding with the restart of all services we are experiencing multiple problems. Many of these do not turn on or otherwise remain in the not ready state as we envince in these photos:

Balda981_9-1688719669023.png

 

  

Balda981_7-1688719593575.png

Balda981_8-1688719635728.png

 

One of the main services "sas-viya-dagentsrv-default" goes into time-out error.

 

Going in order I write down what I have already tested, it would not appear to be zombie services present within the machine.

Running the vi da_entities.dtd command the entity pointing to the license already appeared to be correct:

 

Balda981_0-1688718928941.png

 

In fact, as you can see, at the path described is the new sas license:

 

Balda981_4-1688719328812.png

 

Starting the service again, which presumably is the cause of the failure to start the remaining ones, we always receive the same "failed".

 

The restart time of the attached service is 13:30. The status of the service is as follows:

 

Balda981_5-1688719377598.png

 

The log via journalctl -xe is put here:

 

 

[sas@ip-172-31-7-96 default]$ sudo journalctl -xe
Jul 06 13:30:13 \*******************\ slapd[2354]: conn=1003 op=690 SEARCH RESULT tag=101 err=0 nentries=0 text=
Jul 06 13:30:13 \*******************\ slapd[2354]: conn=1003 op=691 SRCH base="dc=viya,dc=local" scope=2 deref=0 filter="(&(objectClass=p
Jul 06 13:30:13 \*******************\ slapd[2354]: conn=1003 op=691 SRCH attr=memberUid cn gidNumber member
Jul 06 13:30:13 \*******************\ slapd[2354]: conn=1003 op=691 SEARCH RESULT tag=101 err=0 nentries=0 text=
Jul 06 13:30:13 \*******************\ runuser[2888]: pam_unix(runuser:session): session opened for user root by (uid=0)
Jul 06 13:30:14 \*******************\ runuser[2888]: pam_unix(runuser:session): session closed for user root
Jul 06 13:30:14 \*******************\ slapd[2354]: conn=1003 op=692 SRCH base="dc=viya,dc=local" scope=2 deref=0 filter="(&(objectClass=p
Jul 06 13:30:14 \*******************\ slapd[2354]: conn=1003 op=692 SRCH attr=uid uidNumber
Jul 06 13:30:14 \*******************\ slapd[2354]: <= bdb_equality_candidates: (uid) not indexed
Jul 06 13:30:14 \*******************\ slapd[2354]: conn=1003 op=692 SEARCH RESULT tag=101 err=0 nentries=0 text=
Jul 06 13:30:14 \*******************\ slapd[2354]: conn=1003 op=693 SRCH base="dc=viya,dc=local" scope=2 deref=0 filter="(&(objectClass=p
Jul 06 13:30:14 \*******************\ slapd[2354]: conn=1003 op=693 SRCH attr=memberUid cn gidNumber member
Jul 06 13:30:14 \*******************\ slapd[2354]: conn=1003 op=693 SEARCH RESULT tag=101 err=0 nentries=0 text=
Jul 06 13:30:14 \*******************\ runuser[3273]: pam_unix(runuser:session): session opened for user sas by (uid=0)
Jul 06 13:30:15 \*******************\ sas-viya-dagentsrv-default[2857]: level=warn app=sas-crypto-management timestamp=2023-07-06T13:30:1
Jul 06 13:30:15 \*******************\ sas-viya-dagentsrv-default[2857]: level=info app=sas-crypto-management timestamp=2023-07-06T13:30:1
Jul 06 13:31:49 \*******************\ sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:31:
Jul 06 13:33:23 \*******************\ sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:33:
Jul 06 13:34:59 \*******************\ sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:34:
Jul 06 13:35:13 \*******************\ systemd[1]: sas-viya-dagentsrv-default.service start operation timed out. Terminating.
Jul 06 13:35:13 \*******************\ systemd[1]: Failed to start LSB: SAS Data Agent.
-- Subject: Unit sas-viya-dagentsrv-default.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit sas-viya-dagentsrv-default.service has failed.
--
-- The result is failed.
Jul 06 13:35:13 \*******************\ systemd[1]: Unit sas-viya-dagentsrv-default.service entered failed state.
Jul 06 13:35:13 \*******************\ systemd[1]: sas-viya-dagentsrv-default.service failed.
Jul 06 13:35:13 \*******************\ polkitd[750]: Unregistered Authentication Agent for unix-process:2851:16658408 (system bus name :1.
Jul 06 13:35:13 \*******************\ sudo[2849]: pam_unix(sudo:session): session closed for user root
Jul 06 13:35:59 \*******************\ slapd[2354]: conn=1003 op=694 SRCH base="dc=viya,dc=local" scope=2 deref=0 filter="(&(objectClass=p
Jul 06 13:35:59 \*******************\ slapd[2354]: conn=1003 op=694 SRCH attr=uid uidNumber
Jul 06 13:35:59 \*******************\ slapd[2354]: <= bdb_equality_candidates: (uid) not indexed
Jul 06 13:35:59 \*******************\ slapd[2354]: conn=1003 op=694 SEARCH RESULT tag=101 err=0 nentries=0 text=
Jul 06 13:35:59 \*******************\ slapd[2354]: conn=1003 op=695 SRCH base="dc=viya,dc=local" scope=2 deref=0 filter="(&(objectClass=p
Jul 06 13:35:59 \*******************\ slapd[2354]: conn=1003 op=695 SRCH attr=memberUid cn gidNumber member
Jul 06 13:35:59 \*******************\ slapd[2354]: conn=1003 op=695 SEARCH RESULT tag=101 err=0 nentries=0 text=
Jul 06 13:36:38 \*******************\ sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:36:
Jul 06 13:36:38 \*******************\ sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:36:
Jul 06 13:36:38 \*******************\ sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:36:
Jul 06 13:36:38 \*******************\ sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:36:
Jul 06 13:36:38 \*******************\ sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:36:
Jul 06 13:36:38 \*******************\ sas-viya-dagentsrv-default[2857]: Get https://ip-172-31-17-189.eu-central-1.compute.internal:8200/v
Jul 06 13:36:38 \*******************\ sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:36:
Jul 06 13:36:38 \*******************\ runuser[3273]: pam_unix(runuser:session): session closed for user sas
Jul 06 13:38:50 \*******************\ slapd[2354]: conn=1003 op=696 SRCH base="dc=viya,dc=local" scope=2 deref=0 filter="(&(objectClass=p
Jul 06 13:38:50 \*******************\ slapd[2354]: conn=1003 op=696 SRCH attr=uid uidNumber
Jul 06 13:38:50 \*******************\ slapd[2354]: <= bdb_equality_candidates: (uid) not indexed
Jul 06 13:38:50 \*******************\ slapd[2354]: conn=1003 op=696 SEARCH RESULT tag=101 err=0 nentries=0 text=
Jul 06 13:38:50 \*******************\ slapd[2354]: conn=1003 op=697 SRCH base="dc=viya,dc=local" scope=2 deref=0 filter="(&(objectClass=p
Jul 06 13:38:50 \*******************\ slapd[2354]: conn=1003 op=697 SRCH attr=memberUid cn gidNumber member
Jul 06 13:38:50 \*******************\ slapd[2354]: conn=1003 op=697 SEARCH RESULT tag=101 err=0 nentries=0 text=
Jul 06 13:38:50 \*******************\ sudo[7831]:      sas : TTY=pts/0 ; PWD=/opt/sas/viya/config/etc/dagentsrv/default ; USER=root ; COM
Jul 06 13:38:50 \*******************\ sudo[7831]: pam_unix(sudo:session): session opened for user root by ec2-user(uid=0)

 

 

Within the path "/opt/sas/viya/config/var/log/dagentsrv/default/" nothing appears in the restart time.

 

Hoping for help.
Regards,
Thank you

3 REPLIES 3
gwootton
SAS Super FREQ
Have you looked at the logs for any of these services? Have you tried using ansible to apply the new license files instead of the manual process?
--
Greg Wootton | Principal Systems Technical Support Engineer
Balda981
Fluorite | Level 6

Hi,

 

The ansible the past few years has never been used as a mode and also has been placed in an "old installation" folder for years. I wouldn't know if touching it would cause more damage.

 

There is this new log:

 

-- Logs begin at Tue 2023-07-04 15:13:51 UTC, end at Fri 2023-07-07 09:25:22 UTC. --
Jul 06 13:30:13 ip-\*******************\.eu-central-1.compute.internal systemd[1]: Starting LSB: SAS Data Agent...
Jul 06 13:30:13 ip-\*******************\.eu-central-1.compute.internal runuser[2888]: pam_unix(runuser:session): session opened for user root by (uid=0)
Jul 06 13:30:14 ip-\*******************\.eu-central-1.compute.internal runuser[3273]: pam_unix(runuser:session): session opened for user sas by (uid=0)
Jul 06 13:30:15 ip-\*******************\.eu-central-1.compute.internal sas-viya-dagentsrv-default[2857]: level=warn app=sas-crypto-management timestamp=2023-07-06T13:30:15.418802363Z function=old_commands.ConvertUUIDFlagsAndCallActionGenerateUUID deprecated=true newCommand="token uuid generate"
Jul 06 13:30:15 ip-\*******************\.eu-central-1.compute.internal sas-viya-dagentsrv-default[2857]: level=info app=sas-crypto-management timestamp=2023-07-06T13:30:15.418962202Z function=uuid.ActionGenerateUUID status=success
Jul 06 13:31:49 ip-\*******************\.eu-central-1.compute.internal sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:31:49.308352156Z function=connection.(*vaultApiGetOperation).Perform msg="Failed to perform a GET API call against Vault" endpoint=viya_inter/roles/test_web_server err="Get https://ip-172-31-17-189.eu-central-1.compute.internal:8200/v1/viya_inter/roles/test_web_server: dial tcp \*******************\:8200: i/o timeout"
Jul 06 13:33:23 ip-\*******************\.eu-central-1.compute.internal sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:33:23.859051187Z function=connection.(*vaultApiGetOperation).Perform msg="Failed to perform a GET API call against Vault" endpoint=viya_inter/roles/test_web_server err="Get https://ip-172-31-17-189.eu-central-1.compute.internal:8200/v1/viya_inter/roles/test_web_server: dial tcp \*******************\:8200: i/o timeout"
Jul 06 13:34:59 ip-\*******************\.eu-central-1.compute.internal sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:34:59.891539482Z function=connection.(*vaultApiGetOperation).Perform msg="Failed to perform a GET API call against Vault" endpoint=viya_inter/roles/test_web_server err="Get https://ip-172-31-17-189.eu-central-1.compute.internal:8200/v1/viya_inter/roles/test_web_server: dial tcp \*******************\:8200: i/o timeout"
Jul 06 13:35:13 ip-\*******************\.eu-central-1.compute.internal systemd[1]: sas-viya-dagentsrv-default.service start operation timed out. Terminating.
Jul 06 13:35:13 ip-\*******************\.eu-central-1.compute.internal systemd[1]: Failed to start LSB: SAS Data Agent.
Jul 06 13:35:13 ip-\*******************\.eu-central-1.compute.internal systemd[1]: Unit sas-viya-dagentsrv-default.service entered failed state.
Jul 06 13:35:13 ip-\*******************\.eu-central-1.compute.internal systemd[1]: sas-viya-dagentsrv-default.service failed.
Jul 06 13:36:38 ip-\*******************\.eu-central-1.compute.internal sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:36:38.128223621Z function=connection.(*vaultApiGetOperation).Perform msg="Failed to perform a GET API call against Vault" endpoint=viya_inter/roles/test_web_server err="Get https://ip-172-31-17-189.eu-central-1.compute.internal:8200/v1/viya_inter/roles/test_web_server: dial tcp \*******************\:8200: i/o timeout"
Jul 06 13:36:38 ip-\*******************\.eu-central-1.compute.internal sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:36:38.128294079Z function=utils.(*TimeoutOperationPerformer).Perform msg="failed to perform the requested operation" err="Get https://ip-172-31-17-189.eu-central-1.compute.internal:8200/v1/viya_inter/roles/test_web_server: dial tcp \*******************\:8200: i/o timeout"
Jul 06 13:36:38 ip-\*******************\.eu-central-1.compute.internal sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:36:38.128316293Z function=connection.(*VaultClient).getSecretFromPath msg="Failed to read vault secret path." path=viya_inter/roles/test_web_server err="Get https://ip-172-31-17-189.eu-central-1.compute.internal:8200/v1/viya_inter/roles/test_web_server: dial tcp \*******************\:8200: i/o timeout"
Jul 06 13:36:38 ip-\*******************\.eu-central-1.compute.internal sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:36:38.128330456Z function=connection.(*VaultClient).GetPrivateKeySettings msg="Failed to read private key settings" endpoint=viya_inter/roles/test_web_server err="Get https://ip-172-31-17-189.eu-central-1.compute.internal:8200/v1/viya_inter/roles/test_web_server: dial tcp \*******************\:8200: i/o timeout"
Jul 06 13:36:38 ip-\*******************\.eu-central-1.compute.internal sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:36:38.128347039Z function=certificate.GenerateVaultCertificateWithCSR msg="failed to read private key configuration settings from Vault" err="Get https://ip-172-31-17-189.eu-central-1.compute.internal:8200/v1/viya_inter/roles/test_web_server: dial tcp \*******************\:8200: i/o timeout"
Jul 06 13:36:38 ip-\*******************\.eu-central-1.compute.internal sas-viya-dagentsrv-default[2857]: Get https://ip-172-31-17-189.eu-central-1.compute.internal:8200/v1/viya_inter/roles/test_web_server: dial tcp \*******************\:8200: i/o timeout
Jul 06 13:36:38 ip-\*******************\.eu-central-1.compute.internal sas-viya-dagentsrv-default[2857]: level=error app=sas-crypto-management timestamp=2023-07-06T13:36:38.128380822Z function=main.main err="Get https://ip-172-31-17-189.eu-central-1.compute.internal:8200/v1/viya_inter/roles/test_web_server: dial tcp \*******************\:8200: i/o timeout"


I don't know what level=error app=sas-crypto-management timestamp=2023-07-06T13:31:49.308352156Z function=connection.(*vaultApiGetOperation).Perform msg="Failed to perform a GET API call against Vault"  means but I try to see vault service log and I see this:

 

==> Vault server started! Log data will stream in below:

2023-07-05T08:54:29.983Z [INFO]  core.cluster-listener: starting listener: listener_address=0.0.0.0:8201
2023-07-05T08:54:29.983Z [INFO]  core.cluster-listener: serving cluster requests: cluster_listen_address=[::]:8201
2023-07-05T08:54:29.983Z [INFO]  core: vault is unsealed
2023-07-05T08:54:29.983Z [INFO]  core: entering standby mode
2023/07/05 08:54:30.022017 (INFO) Vault successfully unsealed -- running wait_vault_unsealed
2023/07/05 08:54:30.062143 (INFO) Determing leader status using the following curl command: curl -s -k1 https://127.0.0.1:8200/v1/sys/leader
2023/07/05 08:54:30.297824 (INFO) The following response was returned in response to the request for leader status: {"ha_enabled":true,"is_self":false,"leader_address":"https://ip-\*******************\.eu-central-1.compute.internal:8200/","leader_cluster_address":"https://ip-\*******************\.eu-central-1.compute.internal:8201/","performance_standby":false,"performance_standby_last_remote_wal":0}
2023/07/05 08:54:30.396095 (INFO) Vault reported that this instance is not the leader.
2023/07/05 08:54:30.397396 (INFO) Vault reported leader is at instance with ip: ip-\*******************\.eu-central-1.compute.internal.
2023/07/05 08:54:30.570303 (INFO) Response shows Vault reporting this instance is NOT the leader.
2023/07/05 08:54:30.571723 (INFO) Final leader status of this instance of Vault: FALSE
2023/07/05 08:54:30.573116 (INFO) We do not need to wait for Vault to unseal because we are not the leader.
2023/07/05 08:54:30.574623 (INFO) Enabling auditing to file: /opt/sas/viya/config/var/log/vault/default/sas-vault-audit_2023-07-05_08-54-19.log
2023-07-05T08:54:49.998Z [ERROR] core: error during forwarded RPC request: error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial tcp 172.31.17.189:8201: i/o timeout""
2023-07-05T08:54:49.998Z [ERROR] core: forward request error: error="error during forwarding RPC request"
Error enabling audit device: Error making API request.

URL: PUT https://127.0.0.1:8200/v1/sys/audit/file
Code: 500. Errors:

* local node not active but active cluster node not found
2023/07/05 08:54:50.112219 (INFO) Failed to enable auditing to file: /opt/sas/viya/config/var/log/vault/default/sas-vault-audit_2023-07-05_08-54-19.log
2023/07/05 08:54:50.114886 (INFO) Running check_vault_cert_role.sh
2023/07/05 08:54:50.525002 (INFO) Updating role to match current consul values.
2023/07/05 08:54:50.655772 (ERROR) Failed to update role settings.
2023/07/05 08:54:50.658419 (INFO) Failed to update vault role.
2023-07-05T08:55:04.069Z [INFO]  core: acquired lock, enabling active operation
2023-07-05T08:55:04.099Z [INFO]  core: post-unseal setup starting
2023-07-05T08:55:04.099Z [INFO]  core: loaded wrapping token key
2023-07-05T08:55:04.099Z [INFO]  core: successfully setup plugin catalog: plugin-directory=
2023-07-05T08:55:04.101Z [INFO]  core: successfully mounted backend: type=system path=sys/
2023-07-05T08:55:04.102Z [INFO]  core: successfully mounted backend: type=identity path=identity/
2023-07-05T08:55:04.102Z [INFO]  core: successfully mounted backend: type=kv path=secret/
2023-07-05T08:55:04.102Z [INFO]  core: successfully mounted backend: type=pki path=viya/
2023-07-05T08:55:04.102Z [INFO]  core: successfully mounted backend: type=pki path=viya_inter/
2023-07-05T08:55:04.102Z [INFO]  core: successfully mounted backend: type=cubbyhole path=cubbyhole/
2023-07-05T08:55:04.108Z [INFO]  core: successfully enabled credential backend: type=token path=token/
2023-07-05T08:55:04.108Z [INFO]  core: restoring leases
2023-07-05T08:55:04.108Z [INFO]  rollback: starting rollback manager
2023-07-05T08:55:04.110Z [INFO]  identity: entities restored
2023-07-05T08:55:04.113Z [INFO]  identity: groups restored
2023-07-05T08:55:04.134Z [INFO]  core: post-unseal setup complete
2023-07-05T08:55:04.167Z [INFO]  expiration: lease restore complete
==> Vault shutdown triggered
2023-07-05T13:40:57.278Z [INFO]  core: marked as sealed
2023-07-05T13:40:57.278Z [INFO]  core: pre-seal teardown starting
2023-07-05T13:40:57.278Z [INFO]  storage.consul: shutting down consul backend
2023-07-05T13:40:57.779Z [INFO]  rollback: stopping rollback manager
2023-07-05T13:40:57.779Z [INFO]  core: pre-seal teardown complete
2023-07-05T13:40:57.781Z [WARN]  storage.consul: concurrent state change notify dropped
2023-07-05T13:40:57.781Z [ERROR] core: clearing leader advertisement failed: error="Delete https://localhost:8501/v1/kv/vault/core/leader/7481225e-b4df-6fa3-cee3-c43933ea243d: dial tcp [::1]:8501: connect: connection refused"
2023-07-05T13:40:57.781Z [ERROR] core: unlocking HA lock failed: error="failed to release lock: Put https://localhost:8501/v1/kv/vault/core/lock?flags=3304740253564472344&release=a2de4963-c91f-82f0-ecca-d7e21bfd2236: dial tcp [::1]:8501: connect: connection refused"
2023-07-05T13:40:57.781Z [INFO]  core: stopping cluster listeners
2023-07-05T13:40:57.781Z [INFO]  core.cluster-listener: forwarding rpc listeners stopped
2023-07-05T13:40:57.956Z [INFO]  core.cluster-listener: rpc listeners successfully shut down
2023-07-05T13:40:57.956Z [INFO]  core: cluster listeners successfully shut down
2023-07-05T13:40:57.956Z [WARN]  storage.consul: concurrent sealed state change notify dropped
2023-07-05T13:40:57.956Z [INFO]  core: vault is sealed

There are some error but again I don't know what it means.

 

Thanks for you attention,

Matteo

 

gwootton
SAS Super FREQ
Looks like Vault's issue is that it cannot connect to Consul.
--
Greg Wootton | Principal Systems Technical Support Engineer