Hi everyone,
I have performed several jobs in SAS Data Integration (SAS DI) to load data from different sets of external files into target tables of a database. The first step of each job consists of checking if the corresponding files of the job are present in a directory, in a way that the job is aborted if those files are not located in the directory (and it continues in the opposite case). For this task, I am applying the following code:
%macro search;
%let myfilerf = '/directory/set_of_files_1*';
%if %sysfunc(fileexist(&myfilerf)) %then %do;
	%put The external file &myfilerf exists.;
%end;
%else %abort return 5;
%mend search;
%search
Thus, I have performed two different jobs Job_1 and Job_2, which are related to a set of files set_of_files_1_yyymmmddd.txt and set_of_files_2_yyymmmddd.txt, respectively. Then, the previous code works fine when I execute each job separately (i.e. Job_1 is aborted if the set "set_of_files_1_yyymmmddd.txt" is not present in the directory and Job_2 is aborted if the set "set_of_files_2_yyymmmddd.txt" is not present in the directory). However, I need to execute both jobs in SAS DI as a unique complex job, composed by the Job_1 and the Job_2 (a job of multiple jobs) and my code does not work fine when the set "set_of_files_1_yyymmmddd.txt" is not present in the directory because such a complex job is completely aborted in that case and it does not allow the Job_2 to be processed.
Any idea?
I normally try to keep my DIS jobs as simple as possible with each job doing one thing only. So I'd rather have more jobs that are simple than only a few complex jobs.
In your case: Why can't you keep job1 and job2 separate and then use a scheduler like LSF to execute downstream job3 based on the outcome of the execution. You can use an AND or OR condition to decide whether to execute job3 or not - plus you could execute job1 and job2 in parallel.
If you really want to cramp everything into a single job then the LOOP transformation would allow you to execute job1 and job2 in their own SAS sessions (with Execute in Parallel). Eventually then use an ENDSAS and not an ABORT to terminate the jobs so the error condition doesn't get passed down to the calling parent job (job3).
Thanks for your response. I need to perform a unique "General_Job.sas" job composed by the other single jobs, so I need to apply this task in such a way.
I am trying to apply the LOOP transformation within the "General_Job.sas" job just dragging each job and the loop transformation in the following way:
Job_1 ---> LOOP --> Job_2
and replacing the ABORT statement of the checking step of the two single jobs by the ENDSAS statement, but it does not work.
Any idea?
Are you using external file metadata and the file reader for the .txt or have you "hand coded" this step?
Actually: If you've got DIS jobs then using the FORK transformation is likely what you're after. Something like below:
I personally wouldn't design an ETL this way. I'd have separate jobs, a scheduler managing dependencies and a control table for collecting the status/statistics of each job on top of what a scheduler like lsf already does for me.
Yes, I am using the corresponding external file metadata and the file reader for the set of .txt files in each job and also my checking step with the previous "hand coded" to check the presence of the sets in the directory.
I think that your FORK transformation can solved this topic, but I have considered the same scheme than the one present in your image and the code of the transformation nodes 1,3,4,6,7 is empty by default, so it does not work.
Do you know the exact code to implement these nodes in such a scheme and to address this task?
Thank you again.
There will be a solution and I guess I can help you to get to it if I can fully understand what you're dealing with and what you need.
So first looking at your macro: What are you trying to achieve here? Is this about checking if there is ANY file matching a naming pattern in a specific folder location and if yes you want to proceed processing? Or could you instead use your actual external file metadata and you stop further execution of the job if any of the files for which you have external file metadata doesn't exists? ...or have you defined your external file metadata for multiple external files with the same structure (=you're using a macro variable in the file name)?
....all of the above will make a difference for how to approach things so you need to explain what you have and how you've implemented in detail to take this any further.
Yes, the first step of each job checks if there is any file matching a specific naming pattern in a common folder location and if yes then the process continues to the next steps. However, I need to perform a unique job with these two jobs to execute the whole process, so that my "General_Job.sas" works completely fine when the files of the first job are present in the folder location (i.e. the checking step finds the files and continues to the next steps and to the last job) but it does not work fine when these files of the first job are not present, since the process is aborted and it does not continue to the process of the second job.
Is it clearer now?
Thank you very much again for your help.
If I understand you right then one way to go could be to write your job1 and job2 in a way so that they don't impact on downstream processing.
What you could do is wrap all your processing for job1/job2 into a conditional start/end and only read the files and whatever else you're doing if the condition becomes true (below represented by the "do stuff" node).
I've attached above job as .spk created using DIS4.903
User transformation file_exist_chk
Code:
%macro file_exist_chk(myfilerf);
  %global run_flg;
  %let run_flg=0;
  %if %sysfunc(fileexist(&myfilerf)) %then
    %do;
      %let run_flg=1;
      %put The external file &myfilerf exists.;
    %end;
  %else 
    %do;
      %put The external file &myfilerf does not exist.;
      %let run_flg=0;
    %end;
%mend file_exist_chk;
%file_exist_chk(myfilerf);Prompt:
And then here the conditional start:
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
