Solved: Re: Output of scheduled job not same as manually run job

noobs · Posted 08-15-2014 09:00 AM

Hello SAS EG Users,

I have DI job that is created by exporting code in SAS EG. This job is meant to mine all the data and create one observation and update six different tables. Attached is the code for the DI Job.

When this job is run manually, it gives correct output and it matches perfectly with that of SAS EG.

However when this job is scheduled to run on two dependencies, which is basically the source tables are ready and at 5:01 am it generates slightly different data. However it should match exactly with that of output of SAS EG, but in reality the output does not match exactly.

Can you explain this behavior? What might be resulting in this discrepency.

I am also attaching excel workbook that pretty much explains the monitoring that I have been doing for last one week, when there are some days when the output
of scheduled DI job matches exactly the output of SAS EG project and some days when there is slight variance. Please check the pink worksheets that are named as EM_*_F. Let me also mention that the source tables for this job is snapshot and not live production data that changes from time to time.

Thanks,

Dhanashree

noobs · Posted 10-22-2014 08:59 AM

Thank You guys for your inputs.

The issue was with one of the source data tables being used in processing that was assumed to be static, but actually got updated on daily basis at later point of time and hence the entire DI job needed to be scheduled later after all the tables used as source datasets were updated and ready to be used.

Regards,

Dhanashree

View solution in original post

noobs · Posted 08-15-2014 09:02 AM

Here is the DI code of manually run job.

noobs · Posted 08-15-2014 09:04 AM

This is the log for traqcking data discrepency between output of scheduled job and manually run job:

LinusH · Posted 08-15-2014 09:39 AM

What does these figures stand for?

I don't think the community is right place to find people that is willing to gr through hundreds of lines of code in detail, we can give som general advice on how to proceed.

What about no of observations? Carefully compare the logs to id you can fins anything there.

Another idea is to point saswork to permanent location in the jobs, so that you can do PROC COMPARE en each step result, and find out where it starts to deviate.

Data never sleeps

Patrick · Posted 08-18-2014 08:05 AM

So from what I understand you're telling us here:

a) You have a DI job and you run it out of DIS and you get a result

b) You copy/paste the DI generated code into SAS EG and run it with EG and get a result

The results from a) and b) are the same.

c) You deploy the DI job and schedule it. It gets executed at the scheduled time and you get a result.

The result from c) differs from a) and b)

So what I would do to test this further:

1. Copy the code from the deployed job into EG and run it. Compare the result with the one from running the DI job out of DIS. Are they the same? If not: Re-deploy the DI job and repeat the test.

2. Batch submit the deployed job during the day. Run as close as possible also the DI job out of DIS. Compare the results. Are they the same?

Possible causes for the differences I can think of:

- Timing: You run the batch job and the "workspace session" jobs at different times when the underlying data is different - or there is some logic in your code using datetime which leads to different selections

- Environment: The batch environment differs from the "workspace session" environment and you're either pointing to different versions of the data or there are some environment variables used in your selections which differ between these environments.

- Version: You're testing the latest version using DIS but you haven't scheduled (deployed) this version so the batch process executes an older version

Let us know how you go with your testing and what the findings were.

noobs · Posted 08-19-2014 08:58 AM

Hello Patrick,

You summarized the issue correctly.

Can you elaborate more on the ENVIRONMENT point. To compare source data when scheduled job uses it as against when DI job uses it is very complex, but since its my first time attempting to use PROC COMPARE, I will ne judicious about any differences that I may see with source data.

What kind of environment variables may be coming into play during this batch vs workspace session? Where do I look for these settings?

Also in the VERSION section, can you be specific what you are hinting by saying latest version? You mean latest version of code in DI job? I can assure you that the code in SAS EG, manually run DI job and that of scheduled job is exactly same.

So my gut feeling is that it is either timing issue because the variance is not much, environment variables as picture may change for batch sessions.

Thanks for all your inputs. I will be digging into this today and will get back to you with my findings.

Regards,

Dhanashree

Patrick · Posted 08-19-2014 06:39 PM

Environment: There is an autoexec and a .cfg both under the workspace and under the batch server. If anything special is set there used in your code and being different between the 2 server contexts then this might have an influence (eg. a macro variable pointing to your data).

Version: Yes, I was thinking that you eventually didn't deploy your latest DIS version so the batch job would run different code.

It it was me I would first test if it's a timing issue.

- run the code in batch

- run the code out of EG using an %include statement pointing to the deployed code

Just by doing a fast scan through your code I've seen the following:

/*---- Start of Pre-Process Code ----*/

%include

"D:\SAS\SASSolutionsConfig\Lev1\SASMain\StoredProcessServer\autoexec.sas";

/*---- End of Pre-Process Code ----*/

It's not a good idea to have fully hard-coded paths in your code. You need to write code which can get migrated from one environment to the next (as .spk) without the need to change anything in the target environment. Define the root paths as macro variables in an autoexec and then use these macro variables.

Not sure why you're including the Stored Process Server autoexec. That feels kind-of wrong. If you need definitions from there in all environments then move the stuff to the autoexec under SASMain.

Scrolling a bit further through your code:

If I understand this right then you've basically copy/pasted an EG exported code into user written nodes in DIS. Yes, this can work but it's not at all a clean DI job. If this needs to be production worthy code then you should re-build your process properly in DIS using the appropriate transformations (like the SQL Join) and avoid user written code as far as possible. Else: Yes, you will have things faster implemented but maintenance could become a nightmare.

You certainly need to get rid off all EG related code and of all fully specified hard-coded paths.

noobs · Posted 10-22-2014 08:59 AM

Thank You guys for your inputs.

The issue was with one of the source data tables being used in processing that was assumed to be static, but actually got updated on daily basis at later point of time and hence the entire DI job needed to be scheduled later after all the tables used as source datasets were updated and ready to be used.

Regards,

Dhanashree

Registration is open