BookmarkSubscribeRSS Feed
RogerHed
Obsidian | Level 7

Have an issue that removing a cube normally takes 4 seconds but sometimes takes 4 hours on same volume.

We have 3 worker nodes and suspiciously only one is available on these few occasions. In the same jobb the available nodes goes up and down. Have checked the Tkgrid that all-to-all passwordless is OK (since random server is selected as starting point and it could be one of them only could talk to another one). Also checked the CFG so the GRIDRSHCOMMAND  still is used.

 

The log looks like this (approx 15 minutes between the steps and the sam volume on both)::

proc hprisk task=clean cube= "/sas/config/Lev1/AppData/SASModelImplementationPlatform/MIP_WG_Batch/output/run_instances/batch_scenario_run/results/batch_lfb_scenario_run_1";
NOTE: The HPRisk Engine is running on the grid with 1 workers and 32 threads per worker.
real time 4:01:02.15

 

proc hprisk task=clean cube= "/sas/config/Lev1/AppData/SASModelImplementationPlatform/MIP_WG_Batch/output/run_instances/batch_scenario_run/results/batch_lfb_scenario_run_2";
NOTE: The HPRisk Engine is running on the grid with 3 workers and 32 threads per worker.
real time 4.53 seconds

 

Anyone have a clue to the variance to the nodes or why a Clean Cube can take long Real time (the CPU is constant and noone admits locking the cube at 3 O'Clock in the morning)?

1 REPLY 1
RogerHed
Obsidian | Level 7
Follow up. Hypothesis that the 1st node the Master node calls (this is random) may not ben responding due to an unscheduled maintenance, deviating OS or any other reason not found yet.