I need to automatically generate a sas code that dynamically
generate include statement based on beginning and ending job.
/*pid is the parent id*/
data dependency;
length job $20;
input id pid job $;
cards;
1 0 import_data.sas
2 1 check.sas
3 2 sort.sas
4 2 sum.sas
run;
/*new job is: */
data job_run;
input start end;
cards;
1 4
run;
/*end result:
generate new code that contains:
include "import_data.sas";
include "check.sas";
include "sum.sas";
*/
/* another job */
data job_run;
input start end;
cards;
2 3
run;
/*end result:
generate new code that contains:
include "check.sas";
include "sort.sas";
*/
"...possible to generate a sas code..."
You can write the generated SAS code to an external file and then include this file for execution as shown below.
/*pid is the parent id*/
data dependency;
length job $20;
input id pid job $;
cards;
1 0 import_data.sas
2 1 check.sas
3 2 sort.sas
4 2 sum.sas
run;
/*new job is: */
data job_run;
input start end;
cards;
1 4
;
run;
proc sql;
create view v_jobs as
select d.job
from dependency d, job_run j
where d.pid between j.start and j.end
order by d.pid, d.id
;
quit;
/* create SAS source code */
filename codegen temp;
data _null_;
/* file codegen;*/
file print;
set v_jobs;
put '%include "' job +(-1) '" / source2 ;' ;
run;
/* execute generated source code */
/*%include codegen;*/
This will work if ids are unique in dataset dependency :
data dependency (index=(id/unique));
length job $20;
input id pid job $;
cards;
1 0 import_data.sas
2 1 check.sas
3 2 sort.sas
4 2 sum.sas
;
data job_run;
input start end;
cards;
1 4
;
data _null_;
length cmd $1000;
set job_run;
pid = end;
do until(id=start);
id = pid;
set dependency key=id/unique;
cmd = catt('%include "', job, '";', cmd);
end;
call symputx("job", cmd);
run;
%job;
PG
I do have situation, parent id is not unique, will this still work?
In addition, possible to generate a sas code, instead of a macro variable representing series of jobs?
PG's code won't work as-is for duplicate observations of PID. Try something like:
data _null_;
length cmd $1000;
set job_run;
do id = start to end ;
set dependency key=id/unique;
cmd = catt('%include "', job, '";', cmd);
end;
call symputx("job", cmd);
run;
%put Generated SAS code is: "&job";
&job;
Here is what I come up with, but still have two missing pieces:
First, how do I create jrun dataset based on dependency dataset
Second, how do I finish the following code without error?
I did it using put statement in the data _null_; in the data step inside a macro do loop.
It says I use put wrong. I use %put, but it outputs to the log.
I don't know how to output a sas file using do loop macro.
data jrun;
length job_nm $20;
input jid job_nm $;
cards;
1 import_data.sas
2 check.sas
4 sum.sas
run;
%macro p_run(dsn);
DATA _null_;
FILE "~/p_run_&dsn.sas";
proc sql noprint;
select count(distinct jid) into: jnm_nbr from &dsn ;
select job_nm into: jnm1 - :jnm%trim(%left(&jnm_nbr))
from &dsn ;
%put _user_;
quit;
%do n=1 %to &jnm_nbr;
PUT "%runsasfile (spawning,&&jnm&n);";
put "";
%end;
RUN;
%mend;
%p_run(jrun);
/*result should look like this:
in the code, p_run_jrun.sas, it has
these lines:
%runsasfile (spawning,import_data.sas);
%runsasfile (spawning, check.sas);
%runsasfile (spawning, sum.sas);
*/
"...possible to generate a sas code..."
You can write the generated SAS code to an external file and then include this file for execution as shown below.
/*pid is the parent id*/
data dependency;
length job $20;
input id pid job $;
cards;
1 0 import_data.sas
2 1 check.sas
3 2 sort.sas
4 2 sum.sas
run;
/*new job is: */
data job_run;
input start end;
cards;
1 4
;
run;
proc sql;
create view v_jobs as
select d.job
from dependency d, job_run j
where d.pid between j.start and j.end
order by d.pid, d.id
;
quit;
/* create SAS source code */
filename codegen temp;
data _null_;
/* file codegen;*/
file print;
set v_jobs;
put '%include "' job +(-1) '" / source2 ;' ;
run;
/* execute generated source code */
/*%include codegen;*/
Patrick,
Great logic, I like it.
One thing I don't understand:
put '%include "' job +(-1) '" / source2 ;' ;
what is job + (-1)?
Why do you use single quote and double quote?
I want the code written to the temporary file to be
%include "<jobname>" / source2;
In order to write the double quotes to the file I need to quote them with single quotes.
The +(-1) moves the pointer backwards one column. Else the resulting string would have a blank between the jobname and the double quote.
By the way:
If you really want to go for parallel execution if the parent is the same the you would need to generate code using statements like "SYSTASK" together with "WAITFOR" instead of a couple of "%include".
Using "%include" for scheduling has also the disadvantage that all included .sas programs run as a single program sharing the same workspace. This can result in some unexpected issues, eg. if a previous job creates work tables or macro variables which influence the behaviour of a subsequent job. Using systask (or x command, or rsubmit,...) to execute each job in its own environment could be beneficial.
I don't understand in what circumstance a job dependency can have duplicate IDs. What sequence would you want when an ID has more than one PID. For instance :
Dependency
ID PID job
1 0 A
2 1 B
3 1 C
3 2 D
4 3 E
Job_run
Start End
1 4
Would you want A-B-D-E or A-C-E ?
PG
A-B-D-E and A-C-E both are possible, especially when one parent can have multiple child in the process, which represents parallel execution.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.