- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I need to automatically generate a sas code that dynamically
generate include statement based on beginning and ending job.
/*pid is the parent id*/
data dependency;
length job $20;
input id pid job $;
cards;
1 0 import_data.sas
2 1 check.sas
3 2 sort.sas
4 2 sum.sas
run;
/*new job is: */
data job_run;
input start end;
cards;
1 4
run;
/*end result:
generate new code that contains:
include "import_data.sas";
include "check.sas";
include "sum.sas";
*/
/* another job */
data job_run;
input start end;
cards;
2 3
run;
/*end result:
generate new code that contains:
include "check.sas";
include "sort.sas";
*/
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"...possible to generate a sas code..."
You can write the generated SAS code to an external file and then include this file for execution as shown below.
/*pid is the parent id*/
data dependency;
length job $20;
input id pid job $;
cards;
1 0 import_data.sas
2 1 check.sas
3 2 sort.sas
4 2 sum.sas
run;
/*new job is: */
data job_run;
input start end;
cards;
1 4
;
run;
proc sql;
create view v_jobs as
select d.job
from dependency d, job_run j
where d.pid between j.start and j.end
order by d.pid, d.id
;
quit;
/* create SAS source code */
filename codegen temp;
data _null_;
/* file codegen;*/
file print;
set v_jobs;
put '%include "' job +(-1) '" / source2 ;' ;
run;
/* execute generated source code */
/*%include codegen;*/
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This will work if ids are unique in dataset dependency :
data dependency (index=(id/unique));
length job $20;
input id pid job $;
cards;
1 0 import_data.sas
2 1 check.sas
3 2 sort.sas
4 2 sum.sas
;
data job_run;
input start end;
cards;
1 4
;
data _null_;
length cmd $1000;
set job_run;
pid = end;
do until(id=start);
id = pid;
set dependency key=id/unique;
cmd = catt('%include "', job, '";', cmd);
end;
call symputx("job", cmd);
run;
%job;
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I do have situation, parent id is not unique, will this still work?
In addition, possible to generate a sas code, instead of a macro variable representing series of jobs?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
PG's code won't work as-is for duplicate observations of PID. Try something like:
data _null_;
length cmd $1000;
set job_run;
do id = start to end ;
set dependency key=id/unique;
cmd = catt('%include "', job, '";', cmd);
end;
call symputx("job", cmd);
run;
%put Generated SAS code is: "&job";
&job;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here is what I come up with, but still have two missing pieces:
First, how do I create jrun dataset based on dependency dataset
Second, how do I finish the following code without error?
I did it using put statement in the data _null_; in the data step inside a macro do loop.
It says I use put wrong. I use %put, but it outputs to the log.
I don't know how to output a sas file using do loop macro.
data jrun;
length job_nm $20;
input jid job_nm $;
cards;
1 import_data.sas
2 check.sas
4 sum.sas
run;
%macro p_run(dsn);
DATA _null_;
FILE "~/p_run_&dsn.sas";
proc sql noprint;
select count(distinct jid) into: jnm_nbr from &dsn ;
select job_nm into: jnm1 - :jnm%trim(%left(&jnm_nbr))
from &dsn ;
%put _user_;
quit;
%do n=1 %to &jnm_nbr;
PUT "%runsasfile (spawning,&&jnm&n);";
put "";
%end;
RUN;
%mend;
%p_run(jrun);
/*result should look like this:
in the code, p_run_jrun.sas, it has
these lines:
%runsasfile (spawning,import_data.sas);
%runsasfile (spawning, check.sas);
%runsasfile (spawning, sum.sas);
*/
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"...possible to generate a sas code..."
You can write the generated SAS code to an external file and then include this file for execution as shown below.
/*pid is the parent id*/
data dependency;
length job $20;
input id pid job $;
cards;
1 0 import_data.sas
2 1 check.sas
3 2 sort.sas
4 2 sum.sas
run;
/*new job is: */
data job_run;
input start end;
cards;
1 4
;
run;
proc sql;
create view v_jobs as
select d.job
from dependency d, job_run j
where d.pid between j.start and j.end
order by d.pid, d.id
;
quit;
/* create SAS source code */
filename codegen temp;
data _null_;
/* file codegen;*/
file print;
set v_jobs;
put '%include "' job +(-1) '" / source2 ;' ;
run;
/* execute generated source code */
/*%include codegen;*/
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Patrick,
Great logic, I like it.
One thing I don't understand:
put '%include "' job +(-1) '" / source2 ;' ;
what is job + (-1)?
Why do you use single quote and double quote?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I want the code written to the temporary file to be
%include "<jobname>" / source2;
In order to write the double quotes to the file I need to quote them with single quotes.
The +(-1) moves the pointer backwards one column. Else the resulting string would have a blank between the jobname and the double quote.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
By the way:
If you really want to go for parallel execution if the parent is the same the you would need to generate code using statements like "SYSTASK" together with "WAITFOR" instead of a couple of "%include".
Using "%include" for scheduling has also the disadvantage that all included .sas programs run as a single program sharing the same workspace. This can result in some unexpected issues, eg. if a previous job creates work tables or macro variables which influence the behaviour of a subsequent job. Using systask (or x command, or rsubmit,...) to execute each job in its own environment could be beneficial.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I don't understand in what circumstance a job dependency can have duplicate IDs. What sequence would you want when an ID has more than one PID. For instance :
Dependency
ID PID job
1 0 A
2 1 B
3 1 C
3 2 D
4 3 E
Job_run
Start End
1 4
Would you want A-B-D-E or A-C-E ?
PG
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A-B-D-E and A-C-E both are possible, especially when one parent can have multiple child in the process, which represents parallel execution.