DATA Step, Macro, Functions and more

Need help in introducing parallel processing.

Reply
Occasional Contributor
Posts: 13

Need help in introducing parallel processing.

I have a Main macro which validate input file and create a permanent dataset. Each dataset are independent of each other. Currently in the Main macro  there is a dataset which has all the filename. Call execute is then being used  so that each file name is read  and passed as value to inner macro which creates  dataset for each file. I want to change this entire process in such a way that all files be read and dataset created in parallel fashion. Any one have idea how to go about it. Can  systask be used  to call a macro having file name as parameter?? I am not sure how many cpu available. etc..

Respected Advisor
Posts: 3,887

Re: Need help in introducing parallel processing.

If you have SAS/Connect you could use rsubmit blocks like documented here http://support.sas.com/rnd/scalability/tricks/connect.html. There are also other papers with examples on the Internet, eg: http://www2.sas.com/proceedings/forum2008/017-2008.pdf

SYSTASK() would also work. You then pass the parameter (the file name) via "sysparm", eg:

SYSTASK command '....\sas.bat -sysin <program> -log <log name> .... -sysparm <filename>'

  -ICON -NOSPLASH <more commands as found in docu>;

Frequent Contributor
Posts: 95

Re: Need help in introducing parallel processing.

Here is a very simple example of parallel processing using SAS/Connect.

You can pass parameter values from parent to child sessions using %syslput.

Replace %macro p definition with your macro definition or have your macro available to your child sessions in sasautos folders so that they can be searched automatically.

options threads autosignon sascmd="!sascmd -nonews -asynchio -noterminal";

signon task1;

%syslput parm = 1;

rsubmit task1 wait=no;

  %macro p(p);

  %put parm = &p;

  %mend;

  %p(&parm);

endrsubmit;

signon task2;

%syslput parm = 2;

rsubmit task2 wait=no;

  %macro p(p);

  %put parm = &p;

  %mend;

  %p(&parm);

endrsubmit;

waitfor _all_ task1 task2;

signoff task1;

signoff task2;

Occasional Contributor
Posts: 13

Re: Need help in introducing parallel processing.

Hi My problem is as below.

There is a general sas program which calls many macro in order using call execute. I have to replace one of that macro so that parallel processing can be introduced. Now the problem is since my macro will be called using call execute the sas codes get executed in the last where as the macro will be executed first. please let me know hw to overcum this: my code is below:

%macro kill_job;

/* Macro call to kill job which got failed;*/

systask kill &kill;

%mend;

%macro sub_ods_process;

%let loop=1;

%do i = 1 %to &nobs;

%let files_names=%scan(&file_name,&i);

%let order_key=%scan(&order,&i);

%global &files_names&i._status;

%global &files_names&i._name;

systask command "%SYSGET(SASEXE) /* this is a environmental variable defined */

-noterminal

              -logparm 'rollover=session'

                -sysin d:\temp\trial.sas

                -log d:\temp\&files_names._#y.#m.#d_#h.#m.#s.log

                -sysparm &files_names."

                taskname= "&files_names&i._name"

                status="&files_names&i._status";

%if &i= &order_key %then %do;

/* Wait for the task to complete for 15 mins */

WAITFOR  _ALL_ 

%do j= &loop %to &order_key;

%scan(&file_name,&j)&j._name  %str(  )

%end;

timeout=2500;

%if &sysrc ne 0  %then %do;

/* task timed out */

%put batch sas step timed out;

proc sql noprint;

select tranwrd(name,"_STATUS","_NAME")

  into: kill separated by ' '

  from sashelp.vmacro

  where scope ='GLOBAL' and name like upcase('%_status') and value ne '0'

  and input(substr(reverse(scan(name,1,'_')),1,1),8.) between &loop. and &order_key.;

quit;

data _null_;

call execute(‘%kill_job’);

run;

 

%end;

%let loop=%eval(&order_key +1);

%end;

/* End of Batch */

%end;

%mend;

%macro ods_creation;

%global file_name;

%global nobs;

/* Sorting the Audit tables and creating temp dataset having file_name which are success  */

    proc sort data=&mac.stg_load_audit (where=( main_job_id=input(strip(resolve('&mac_main_job_id')),8.) and  job_status=1))

              out=stg_load_audit(keep= file_surrogate_key file_name);

        by file_surrogate_key;

    run;

/* Sorting the mst file and creating temp dataset  */

    proc sort data=&mac.mst_file (keep=file_surrogate_key file_load_order) out=mst_file_metadata;

      by file_surrogate_key;

    run;

/* Merge above temp tables and get the file_load_order for each of the successed AP04 files */

    data audit_order;

      merge stg_load_audit(in=a) mst_file_metadata(in=b);

      by file_surrogate_key;

      if a;

    run;

    proc sort data=audit_order;

      by file_load_order;

    run;

/* Counting Number of Records present in AP04*/

    data _null_;

      dsid=(open("work.audit_order","in"));

      nobs=(attrn(dsid,"nobs"));

      call symputx('nobs',nobs);

      rc=close(dsid);

   run;

/* Creating Macro variable for each file_name */

proc sql noprint;

   Select file_name, count(file_load_order) + min(monotonic())-1

   into: file_name separated by  ' ',

       : order separated by ' '

   from audit_order

   group by file_load_order;

quit;

data _null_;

call execute('%sub_ods_process');

run;

/* Deleting temp datasets */

proc datasets library=work kill nolist;

quit;

%mend;     

Frequent Contributor
Posts: 95

Re: Need help in introducing parallel processing.

It looks like you will be starting the process with a macro call to '%ods_creation' macro:

%ods_creation;

It will create driver tables for later steps.

Then, it will call '%sub_ods_process' macro. We are still in the same session.

Inside this macro SAS is starting up child sessions and when &i= &order_key waiting for all child sessions to finish or time out 2500 sec.

How many child sessions are there running when &i = &order_key (1, 2, ...)?

SYSTASK COMMAND will start each child SAS session with NOWAIT option in effect by default.

Can you run the following macros and post the log to see how &i and &order_key values change?

Need to see how many &i values there have been when &i = &order_key since it is the criteria to wait for all child sessions to finish or time out.

%macro sub_ods_process_test;

%let loop=1;

%put order = &order nobs = &nobs;

%do i = 1 %to &nobs;

    %let files_names=%scan(&file_name,&i);

    %let order_key=%scan(&order,&i);

    %put i=&i order_key=&order_key;

    %if &i= &order_key %then %do;

        %let loop=%eval(&order_key +1);

    %end;

%end;

%mend;

%macro ods_creation_test;

%global file_name;

%global nobs;

/* Sorting the Audit tables and creating temp dataset having file_name which are success  */

    proc sort data=&mac.stg_load_audit (where=( main_job_id=input(strip(resolve('&mac_main_job_id')),8.) and  job_status=1))

              out=stg_load_audit(keep= file_surrogate_key file_name);

        by file_surrogate_key;

    run;

/* Sorting the mst file and creating temp dataset  */

    proc sort data=&mac.mst_file (keep=file_surrogate_key file_load_order) out=mst_file_metadata;

      by file_surrogate_key;

    run;

/* Merge above temp tables and get the file_load_order for each of the successed AP04 files */

    data audit_order;

      merge stg_load_audit(in=a) mst_file_metadata(in=b);

      by file_surrogate_key;

      if a;

    run;

    proc sort data=audit_order;

      by file_load_order;

    run;

/* Counting Number of Records present in AP04*/

    data _null_;

      dsid=(open("work.audit_order","in"));

      nobs=(attrn(dsid,"nobs"));

      call symputx('nobs',nobs);

      rc=close(dsid);

   run;

/* Creating Macro variable for each file_name */

proc sql noprint;

   Select file_name, count(file_load_order) + min(monotonic())-1

   into: file_name separated by  ' ',

       : order separated by ' '

   from audit_order

   group by file_load_order;

quit;

%sub_ods_process_test;

/* Deleting temp datasets */

proc datasets library=work kill nolist;

quit;

%mend;

%ods_creation_test;

Occasional Contributor
Posts: 13

Re: Need help in introducing parallel processing.

Hi Alpay,

There will be round  about 90 files in total which are grouped together based on the source system from where these files are originating. so there are at the max say 10-15 files. from a particular source system. so one group of this files are send using systask and then waited for it to be finished and then nxt set of files from different source system is executed.

Now big problem is the macro %ODS_CREATION is called from another sas program which is the MAIN JOB. that to %ODS_CREATION macro is called using call execute  from th MAIN job. so when i am executing the code... all the macro are getting executed first and then the systask commands are executed. Its a hard lesson I learned about call execute.

Frequent Contributor
Posts: 95

Re: Need help in introducing parallel processing.

Does %ods_creation macro get called inside the same SAS session? If a SAS session is executing '%ods_creation' macro via call execute as follows I think we are still in the same SAS session:

data _null_;

  call execute('%ods_creation');

run;

If one group of files from a source system consist of 10-15 files does that mean using SYSTASK command will start up 10-15 child SAS sessions (one session per file) one after another and wait for all to finish or time out in 2500 sec?

I am trying to envision how these jobs run. How many child sessions can you run in parallel? How long does a job run on average?

Occasional Contributor
Posts: 13

Re: Need help in introducing parallel processing.

yes you are right. It wil be a load on the server.  each files runs on the average say 4 mins. I want to test the cpu capacity and going forward i will  place a limit upto which systask is executed parallely. al this are in trial phase. by the way i modified the code a little bit. nw the code is getting executed but it is not calling the systask. i am jus getting a message in the log that

"

 

NOTE: The task "BH_11_CRC_CRBDAT01_TXT" is not an active task.

NOTE: "BH_11_CRC_CRBDAT01_TXT" is not an active task/transaction.

" where BH_11_CRC_CRBDAT01_TXT is the tast name i gave for one of the systake command.

my code is as below:

%global mac_main_job_id ;

%macro ods_creation;


proc sort data=ap04_stg_load_audit (where=( main_job_id=input(strip(resolve('&mac_main_job_id')),8.) and  job_status=1))
   out=ap04_stg(keep= file_surrogate_key file_name);
   by file_surrogate_key;
run;

proc sort data=&mac_stg.mst_file(keep=file_surrogate_key file_load_order) out=mst_file
    by file_surrogate_key;
run;

/* Merge above temp tables and get the file_load_order for each of the successed AP04 files */

data ap04;
    merge ap04_stg(in=a) mst_file(in=b);
    by file_surrogate_key;
         if a;
                file=translate(file_name,'_','.');
run;

proc sort data=ap04;
    by file_load_order;
run;

data _null_;
    set ap04;
    by file_load_order;
    length mappings $30000 all_tasks $ 3000;
    retain mappings all_tasks;
    mappings=cats("SYSTASK COMMAND ""'C:\program files\sas\sas.exe' -noterminal -logparm 'rollover=session' -sysin 'c:\ods main\ods_load\ods_load.sas'
                -sysparm '"||strip(file_name)||"^&mac_stg^&mac_ent^&mac_ods"||"' -log '%SYSGET(ODS_LOG_LOC)\"||"&MAC_ent"||"\"||STRIP(file)||"_#Y.#m.#d_#H.#M.#s.LOG'  TASKNAME="||STRIP(file)||'"'||" ",';',mappings);
    all_tasks=cats(file,' ',all_tasks);
    if last.file_load_order then do;
  /*suspend sas session till jobs completed or 16 minutes*/
    call execute(strip(mappings)||' waitfor _all_ '||strip(all_tasks)||' timeout=1000;');
  /*kill any running job*/
    call execute('systask  kill '||strip(all_tasks)||' ; ');
    mappings='';all_tasks='';
    end;
run;
   
/* update sub job tracker */
data _null_;
    call execute('%upd_sub_job1(par_audit_name=ap05_ods_load_audit)');
run;

%mend;

you hav any idea why that note is generated. my ods creation code is completing as well the pgm which calls ods creation macro. but that sas program is not getting executed.

Frequent Contributor
Posts: 95

Re: Need help in introducing parallel processing.

Is it possible that above mentioned job was already finished? Do you see the log file created for this job? If so, please look inside the log file to see if the job has run successfuly.

If the log file is not there I would think the job did not start for some reason.

Occasional Contributor
Posts: 13

Re: Need help in introducing parallel processing.

Finally i am getting the output... I went back to my first code. I made one modification. The call execute in the main program calls a macro which start a new session. in this i am calling my ods creation macro based. As of now I am did not try on large input just working on 1 or 3 files most.

Ask a Question
Discussion stats
  • 9 replies
  • 1335 views
  • 0 likes
  • 3 in conversation