DATA Step, Macro, Functions and more

Slow performance macro

Reply
Contributor ven
Contributor
Posts: 37

Slow performance macro


I am having a big performance issue with the macro I 've written, Extremely slow (no actual output) when runs with large input data sets., Appreciate any help in re writing the code.

My code is as below.

%macro getHistAllPlanteam(PPID,s_start_date);

data pt_all_planteam_temp (keep=ParentPartyID All_Planners g_snapshot_start_date g_snapshot_end_date );
set &_INPUT1 (where=(Contact_planner ne "n/a" and ParentPartyID="&PPID" and g_snapshot_start_date="&s_start_date"dt));
by ParentPartyID Plan_mgmt_team_member;

length All_Planners $255;

if first.ParentPartyID then do;
  retain All_Planners;
  All_Planners=cat(trim(Plan_mgmt_team_member)," (",trim(Participant_Management_Role_Desc),")");
end;
  else do All_Planners=catx("; ",trim(All_Planners),cat(trim(Plan_mgmt_team_member)," (",trim(Participant_Management_Role_Desc),")"));
end;
  if last.ParentPartyID then output;

run;


proc append base=&_OUTPUT1 data=pt_all_planteam_temp force;
run;


%mend getHistAllPlanteam;

proc sql;
delete from &_OUTPUT1;
run;

Data _Null_;
set &_INPUT2;
st_dtmval=put(g_snapshot_start_date,DATETIME.);
call execute('%getHistAllPlanteam('||ParentPartyID||','||st_dtmval||')');
run;

Below is a sample input;

&_INPUT2
ParentPartyID  g_snapshot_start_date      g_snapshot_end_date
1-104O5G        26FEB15:00:00:00          01JAN99:00:00:00

&_INPUT1
ParentPartyID      g_snapshot_start_date      g_snapshot_end_date                          Plan_mgmt_team_member  Participant_Management_Role_Desc 
1-104O5G            26FEB15:00:00:00            1JAN99:00:00:00                                    David Addison                  Primary Planner                                                                                                 
1-104O5G            6FEB15:00:00:00              01JAN99:00:00:00                                  Rachael Mayne                    Plan Support Coordinator

Expected output:

ParentPartyID      All_plan_mgmt_team                                                                                                  g_snapshot_start_date          g_snapshot_end_date

1-104O5G            David Addison (Primary Planner); Rachael Mayne (Plan Support Coordinator)      26FEB15:00:00:00                  01JAN99:00:00:00

Attachment
Attachment
Attachment
Attachment
Attachment
Trusted Advisor
Posts: 3,212

Re: Slow performance macro

There is no slow performance of a macro. Your design for the data-processing is one only acceptable for very small datasets (Mb's).
Problems:

1/ you are generating separate datasteps/appends using the call execute

2/ There is a delete from * with proc sql, This could be the only one working with an external DBMS (no DDL allowed only DML).

3/ not seeing any thoughts for big data tuning.

Going back to the basics.

- what is your environment (OS SAS )

- limitations (sizing dasd - memory)    

- Is the data in an external DBMS or only local

- What is your logical problem/issue.

Bulk/batch processing van be tuned for that purpose as there isn online/transactional processing.  The latter one always is having some performance penalties.   

---->-- ja karman --<-----
Contributor ven
Contributor
Posts: 37

Re: Slow performance macro

HI Jaap,

I'm using the code in SAS DI custom made transformation.

I am using Windows 7

I am having about 4GB of memory but my dataset set is having only

&_INPUT1=16,000 rows and &_INPUT2=13196 rows.

Data is only as SAS Datasets, no RDBS involved.

Logically,what I need to achieve is for each parentpartyID concatenate the values of Participant_Management_Role_Desc across Plan_mgmt_team_member and put into another distinct variable in a SAS DI Transformation.

I have initially discuss first issue of this problem in: https://communities.sas.com/thread/75832

Thank You!

Trusted Advisor
Posts: 3,212

Re: Slow performance macro

Small..... You decided to write own code (macro) not DI transforms.  Wating for the hash approach of Xia ( he is experienced-fast with that) and executing will be fasssst

---->-- ja karman --<-----
Super User
Posts: 10,020

Re: Slow performance macro

Thanks your compliment .Jaap Karman

Since OP didn't give enough data , It is hard to give some advice.

Contributor ven
Contributor
Posts: 37

Re: Slow performance macro

Hi Xia,

Thanks for your concern. I have attached 2 sample input data file and sas code file which will create output1 as I required.

This is not working when input files are large. Quite new to hash tables appreciate your help.

Cheers

Super User
Posts: 10,020

Re: Slow performance macro

It seems that you are doing some QUERY thing . Hash Table would be good for that.

Trusted Advisor
Posts: 3,212

Re: Slow performance macro

Could be Xia could be...   Let us wait the op's coming with all information.
- May be a hash is good when it is not Iimited by memory constraints,
- May be some sql or indatabase processing when data is in an external dbms.
- May be some advanced buffer tuning or some other smart analytics solution
- May be...

---->-- ja karman --<-----
Super User
Posts: 10,020

Re: Slow performance macro

OK. Hope you have enough memory to handle this.

proc import datafile='C:\temp\Input1.csv' out=input1 dbms=csv replace;
delimiter='|';
run;
proc import datafile='C:\temp\Input2.csv' out=input2 dbms=csv replace;
delimiter='|';
run;

data temp;
 if _n_ eq 1 then do;
  if 0 then set input2;
  declare hash h(dataset:'input2',hashexp:20);
  h.definekey('ParentPartyID','g_snapshot_start_date');
  h.definedone();
 end;
set input1;
if h.check()=0 and Contact_planner ne "n/a" ;
run;
proc sort data=temp;by ParentPartyID g_snapshot_start_date;run;
data want;
 set temp;
 by ParentPartyID g_snapshot_start_date;
 length All_Planners $ 255;
 retain All_Planners;
 All_Planners=catx(" ",All_Planners,Plan_mgmt_team_member,cats("(",Participant_Management_Role_Desc,")"));
 if last.g_snapshot_start_date then do;output; call missing(All_Planners);end;
run;

Xia Keshan

Contributor ven
Contributor
Posts: 37

Re: Slow performance macro

Thanks for that, yet in my code my By variables are  ParentPartyID and Plan_mgmt_team_member  , I will try to add it and check it out;

Super User
Super User
Posts: 7,039

Re: Slow performance macro

So it does not look like you need a macro at all.  You just want to merge the two dataset and collapse the rows.

data INPUT1 ;

  infile cards dsd dlm='|' ;

  length ParentPartyId $10

         g_snapshot_start_date g_snapshot_end_date 8

         Plan_mgmt_team_member

         Participant_Management_Role_Desc

         Contact_planner

         Contact_RSO_or_RAO

         Contact_LAC  $50

  ;

  informat g_snapshot_start_date g_snapshot_end_date datetime.;

  format  g_snapshot_start_date g_snapshot_end_date datetime.;

  input ParentPartyId -- Contact_LAC;

cards;

1-100O7B|16MAR15:00:00:00|01JAN99:00:00:00|Christine Godden|Primary Planner|Christine Godden|n/a|n/a

1-100O7B|13MAR15:00:00:00|15MAR15:23:59:59|David Addison|Primary Planner|David Addison|n/a|n/a                                                                                               

1-100O7B|13MAR15:00:00:00|15MAR15:23:59:59|Rachael Mayne|Plan Support Coordinator|Rachael Mayne|n/a|n/a                                                                                               

1-101HPT|16DEC14:00:00:00|01JAN99:00:00:00|Annalise Korsch|Primary Planner|Annalise Korsch|n/a|n/a

run;

data INPUT2 ;

  infile cards dsd dlm='|' ;

  length ParentPartyId $10

         g_snapshot_start_date g_snapshot_end_date 8

  ;

  informat g_snapshot_start_date g_snapshot_end_date datetime.;

  format  g_snapshot_start_date g_snapshot_end_date datetime.;

  input ParentPartyId -- g_snapshot_end_date;

cards;

1-100O7B|13MAR15:00:00:00|15MAR15:23:59:59

1-100O7B|16MAR15:00:00:00|01JAN99:00:00:00

1-101HPT|16DEC14:00:00:00|01JAN99:00:00:00

run;

proc sort data=input1;

    by ParentPartyID g_snapshot_start_date g_snapshot_end_date;

run;

proc sort data=input2;

    by ParentPartyID g_snapshot_start_date g_snapshot_end_date;

run;

data pt_all_planteam_temp  ;

    merge INPUT1 (in=in1) INPUT2 (in=in2);

    by ParentPartyID g_snapshot_start_date g_snapshot_end_date;

    if in1 and in2;

    length All_Planners $255;

    retain All_Planners;

    if first.g_snapshot_end_date then

        All_Planners=cat(trim(Plan_mgmt_team_member)," (",trim(Participant_Management_Role_Desc),")")

    ;

    else

        All_Planners=catx("; ",trim(All_Planners),cat(trim(Plan_mgmt_team_member)," (",trim(Participant_Management_Role_Desc),")"))

    ;


    if last.g_snapshot_end_date;

    keep ParentPartyID g_snapshot_start_date g_snapshot_end_date All_Planners ;

run;

Contributor ven
Contributor
Posts: 37

Re: Slow performance macro

Thanks for the reply, Yet by variable  Plan_mgmt_team_member excluded and it has to be in the code. Will check will this approach by including the missing by variable.

Super User
Super User
Posts: 7,039

Re: Slow performance macro

Why do you think that variable was excluded? 

Did you mean that you want to create more output records based on changes to that variable also?  if so then you cannot do the merge and the roll up in one step (without more complicated code) since it is not on the second table you posted.

Contributor ven
Contributor
Posts: 37

Re: Slow performance macro

Hi Tom,

Firstly with Set By A, B variables last.A not necessarily means last.B as well. I have added input.txt file (BY ParentPartyID g_snapshot_start_date) with those variable values. Actually this was my initial problem as well and prevent me from merging in the first instance.

I have added the expected result in the second file called Expected_output.txt  where my expected output is the dataset OUTPUT1 in TestCode.sas.


Hope this explains the issue more clearly.

Thanks!

Super User
Super User
Posts: 7,039

Re: Slow performance macro

Did you even run the code I posted?  It produces the results you requested.

Here is a link that explains how FIRST. and LAST. variables work.

http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000761931.htm

Ask a Question
Discussion stats
  • 21 replies
  • 781 views
  • 6 likes
  • 4 in conversation