Solved: Re: how to generate sas code using recursive join

ZRick · Posted 08-12-2013 01:50 PM

I need to automatically generate a sas code that dynamically
generate include statement based on beginning and ending job.

/*pid is the parent id*/

data dependency;

length job $20;

input id pid job $;

cards;

1 0 import_data.sas

2 1 check.sas

3 2 sort.sas

4 2 sum.sas

run;

/*new job is: */

data job_run;

input start end;

cards;

1 4

run;

/*end result:

generate new code that contains:

include "import_data.sas";

include "check.sas";

include "sum.sas";

*/

/* another job */

data job_run;

input start end;

cards;

2 3

run;

/*end result:

generate new code that contains:

include "check.sas";

include "sort.sas";

*/

Patrick · Posted 08-12-2013 09:18 PM

"...possible to generate a sas code..."

You can write the generated SAS code to an external file and then include this file for execution as shown below.

/*pid is the parent id*/
data dependency;
length job $20;
input id pid job $;
cards;
1 0 import_data.sas
2 1 check.sas
3 2 sort.sas
4 2 sum.sas
run;

/*new job is: */
data job_run;
input start end;
cards;
1 4
;
run;

proc sql;
create view v_jobs as
    select d.job
    from dependency d, job_run j
    where d.pid between j.start and j.end
    order by d.pid, d.id
    ;
quit;

/* create SAS source code */
filename codegen temp;
data _null_;
/* file codegen;*/
file print;
set v_jobs;
put '%include "' job +(-1) '" / source2 ;' ;
run;

/* execute generated source code */
/*%include codegen;*/

View solution in original post

PGStats · Posted 08-12-2013 02:54 PM

This will work if ids are unique in dataset dependency :

data dependency (index=(id/unique));
length job $20;
input id pid job $;
cards;
1 0 import_data.sas
2 1 check.sas
3 2 sort.sas
4 2 sum.sas
;

data job_run;
input start end;
cards;
1 4
;

data _null_;
length cmd $1000;
set job_run;
pid = end;
do until(id=start);
     id = pid;
     set dependency key=id/unique;
     cmd = catt('%include "', job, '";', cmd);
     end;
call symputx("job", cmd);
run;

%job;

PG

ZRick · Posted 08-12-2013 04:13 PM

I do have situation, parent id is not unique, will this still work?

In addition, possible to generate a sas code, instead of a macro variable representing series of jobs?

Fugue · Posted 08-12-2013 06:20 PM

PG's code won't work as-is for duplicate observations of PID. Try something like:

data _null_;
length cmd $1000;
set job_run;
do id = start to end ;
set dependency key=id/unique;
cmd = catt('%include "', job, '";', cmd);
end;
call symputx("job", cmd);
run;

%put Generated SAS code is: "&job";

&job;

ZRick · Posted 08-12-2013 08:42 PM

Here is what I come up with, but still have two missing pieces:

First, how do I create jrun dataset based on dependency dataset

Second, how do I finish the following code without error?

I did it using put statement in the data _null_; in the data step inside a macro do loop.

It says I use put wrong. I use %put, but it outputs to the log.

I don't know how to output a sas file using do loop macro.

data jrun;

length job_nm $20;

input jid job_nm $;

cards;

1 import_data.sas

2 check.sas

4 sum.sas

run;

%macro p_run(dsn);

DATA _null_;

FILE "~/p_run_&dsn.sas";
proc sql noprint;

select count(distinct jid) into: jnm_nbr from &dsn ;

select job_nm into: jnm1 - :jnm%trim(%left(&jnm_nbr))
from &dsn ;

%put _user_;
quit;

%do n=1 %to &jnm_nbr;
PUT "%runsasfile (spawning,&&jnm&n);";
put "";

%end;

RUN;

%mend;

%p_run(jrun);

/*result should look like this:

in the code, p_run_jrun.sas, it has
these lines:

%runsasfile (spawning,import_data.sas);

%runsasfile (spawning, check.sas);

%runsasfile (spawning, sum.sas);

*/

Patrick · Posted 08-12-2013 09:18 PM

"...possible to generate a sas code..."

You can write the generated SAS code to an external file and then include this file for execution as shown below.

/*pid is the parent id*/
data dependency;
length job $20;
input id pid job $;
cards;
1 0 import_data.sas
2 1 check.sas
3 2 sort.sas
4 2 sum.sas
run;

/*new job is: */
data job_run;
input start end;
cards;
1 4
;
run;

proc sql;
create view v_jobs as
    select d.job
    from dependency d, job_run j
    where d.pid between j.start and j.end
    order by d.pid, d.id
    ;
quit;

/* create SAS source code */
filename codegen temp;
data _null_;
/* file codegen;*/
file print;
set v_jobs;
put '%include "' job +(-1) '" / source2 ;' ;
run;

/* execute generated source code */
/*%include codegen;*/

ZRick · Posted 08-13-2013 11:42 AM

Patrick,

Great logic, I like it.

One thing I don't understand:

put '%include "' job +(-1) '" / source2 ;' ;

what is job + (-1)?

Why do you use single quote and double quote?

Patrick · Posted 08-13-2013 09:09 PM

I want the code written to the temporary file to be

%include "<jobname>" / source2;

In order to write the double quotes to the file I need to quote them with single quotes.

The +(-1) moves the pointer backwards one column. Else the resulting string would have a blank between the jobname and the double quote.

Patrick · Posted 08-13-2013 09:18 PM

By the way:

If you really want to go for parallel execution if the parent is the same the you would need to generate code using statements like "SYSTASK" together with "WAITFOR" instead of a couple of "%include".

Using "%include" for scheduling has also the disadvantage that all included .sas programs run as a single program sharing the same workspace. This can result in some unexpected issues, eg. if a previous job creates work tables or macro variables which influence the behaviour of a subsequent job. Using systask (or x command, or rsubmit,...) to execute each job in its own environment could be beneficial.

PGStats · Posted 08-12-2013 10:53 PM

I don't understand in what circumstance a job dependency can have duplicate IDs. What sequence would you want when an ID has more than one PID. For instance :

Dependency
ID PID job
1 0 A
2 1 B
3 1 C
3 2 D
4 3 E

Job_run
Start End
1 4

Would you want A-B-D-E or A-C-E ?

PG

ZRick · Posted 08-13-2013 11:21 AM

A-B-D-E and A-C-E both are possible, especially when one parent can have multiple child in the process, which represents parallel execution.

Registration is open

SAS Training: Just a Click Away