topic Re: Tips on increasing processing efficiency in SAS Enterprise Guide

Tips on increasing processing efficiency

Sean_OConnor — Wed, 01 May 2019 08:34:48 GMT

Each month I will have to process raw payroll information and create an analysis dataset out of it for people within my compnay.

Here is my current thinking.

When I get the raw March 2020 data, I will break this out into smaller datasets based on the paymonth. So all January payments will be put into a SAS dataset called Jan_2020M03, all February payments into Feb_2020M03 etc. The M03 allows me to identify which month the raw data has came in on.

However, I will need to recreate my analysis dataset each month. To create the Jan_2020Analysis dataset I will need to.

First read in the Jan_2020M01 dataset. I will then need to read in Jan_2020M02 dataset. From here I will need to overwrite information in the Jan_2020M01 information with information Jan_2020M02. I can do this with a unique payslip id. However, I will need to read in the Jan_2020M03 and potentially overwrite information in the Jan_2020M01 and Jan_2020M02. This will need to keep occurring until I have read in all the Jan_2020 datasets. Once this is done, I can then create my analysis dataset.

I need some dynamic way to this. My folder structure is as follows;

\\location\2020\2020M01

So within the above location I have 12 sas datasets Jan_2020M01...Feb_2020M01....Dec_2020M01

However, I have 12 other folders such as

\\location\2020\2020M02

So within the above location I have 12 sas datasets Jan_2020M02...Feb_2020M02....Dec_2020M02

So I need a way for SAS to only pick out the datasets I want. So when I want to recreate the January file pick out all the January datasets.

I would appreciate any advise.

Re: Tips on increasing processing efficiency

Kurt_Bremser — Wed, 01 May 2019 13:08:40 GMT

Since you have a unique id, you can try the following:

data jan_2020analysis;
merge
  libm01.jan_2020m01
  libm02.jan_2020m02
  .....
  libm12.jan_2020m12
;
by payslip_id;
run;

Once you can verify that this works (the datasets are read in sequence, so any newer information for a particular payslip will automatically replace the older in the PDV), you can start to macrotize it. First of all, replace jan and 2020 with macro variables. Next, wrap the step in a macro and loop from 1 to 12 to create the month sequence:

merge
%do i = 1 %to 12;
  libm%sysfunc(putn(&i.,z2.)).&month._&year.m%sysfunc(putn(&i.,z2.))
%end;
;

by using month and year as macro parameters, you can then call the macro repeatedly for months and years (eg from a dataset with call execute()).

Re: Tips on increasing processing efficiency

Astounding — Wed, 01 May 2019 13:12:27 GMT

Two comments to get things rolling ...

Why do you need to split the data into months (only to recombine them at the end)? Why not just keep one analysis data set, and use the next month's batch of data to update this master data set?
Data set names should be different if you need to support an automated process. For example, Jan_2020M03 should be named _2020_01_M03 to more easily support the sorting and looping that an automated approach might require.

Re: Tips on increasing processing efficiency

Kurt_Bremser — Wed, 01 May 2019 13:19:00 GMT

Agree with @Astounding on naming the months. On top of that, I see no resons to split the libraries, as the dataset names are unique anyway. Just one RAW library should suffice.