BookmarkSubscribeRSS Feed
Sean_OConnor
Obsidian | Level 7

Each month I will have to process raw payroll information and create an analysis dataset out of it for people within my compnay.

 

Here is my current thinking.

 

When I get the raw March 2020 data, I will break this out into smaller datasets based on the paymonth. So all January payments will be put into a SAS dataset called Jan_2020M03, all  February payments into Feb_2020M03 etc. The M03 allows me to identify which month the raw data has came in on.

 

However, I will need to recreate my analysis dataset each month. To create the Jan_2020Analysis dataset I will need to.

 

First read in the Jan_2020M01 dataset. I will then need to read in Jan_2020M02 dataset. From here I will need to overwrite information in the Jan_2020M01 information with information Jan_2020M02. I can do this with a unique payslip id. However, I will need to read in the Jan_2020M03 and potentially overwrite information in the Jan_2020M01 and Jan_2020M02. This will need to keep occurring until I have read in all the Jan_2020 datasets. Once this is done, I can then create my analysis dataset.

 

I need some dynamic way to this. My folder structure is as follows;

 

\\location\2020\2020M01

 

So within the above location I have 12 sas datasets Jan_2020M01...Feb_2020M01....Dec_2020M01

 

However, I have 12 other folders such as

 

\\location\2020\2020M02

 

So within the above location I have 12 sas datasets Jan_2020M02...Feb_2020M02....Dec_2020M02

 

So I need a way for SAS to only pick out the datasets I want. So when I want to recreate the January file pick out all the January datasets.

 

I would appreciate any advise.

3 REPLIES 3
Kurt_Bremser
Super User

Since you have a unique id, you can try the following:

data jan_2020analysis;
merge
  libm01.jan_2020m01
  libm02.jan_2020m02
  .....
  libm12.jan_2020m12
;
by payslip_id;
run;

Once you can verify that this works (the datasets are read in sequence, so any newer information for a particular payslip will automatically replace the older in the PDV), you can start to macrotize it. First of all, replace jan and 2020 with macro variables. Next, wrap the step in a macro and loop from 1 to 12 to create the month sequence:

merge
%do i = 1 %to 12;
  libm%sysfunc(putn(&i.,z2.)).&month._&year.m%sysfunc(putn(&i.,z2.))
%end;
;

by using month and year as macro parameters, you can then call the macro repeatedly for months and years (eg from a dataset with call execute()).

 

Astounding
PROC Star

Two comments to get things rolling ...

 

  • Why do you need to split the data into months (only to recombine them at the end)?  Why not just keep one analysis data set, and use the next month's batch of data to update this master data set?
  • Data set names should be different if you need to support an automated process.  For example, Jan_2020M03 should be named _2020_01_M03 to more easily support the sorting and looping that an automated approach might require.
Kurt_Bremser
Super User

Agree with @Astounding on naming the months. On top of that, I see no resons to split the libraries, as the dataset names are unique anyway. Just one RAW library should suffice.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1070 views
  • 0 likes
  • 3 in conversation