BookmarkSubscribeRSS Feed
JJP1
Pyrite | Level 9

Hi,

I was trying to create a SAS DI job where we have create below jobs :

1.Parameterised Job

2.Loop Job

 

In loop job where we are getting distinct year and month and using loop functionality to load it to final SAS datasets

 

Actually are loading the data based on below values to loop over it

 

first time 

month year

NOV  2018

Jan  2018

DEC 2017

 

Actually i was looping over NOV 2018  and load into SAS dataset

and then Jan 2018 load,and then DEC 2017 dataset like wise.

 

so is there any way to loop not one month in year,but rather 2 or 3months to loop over it.

 

please help

 

 

In Theory i need to below steps,please help

 

1.Extract the minimum date value from existing column(date) from datasets and get the data from all data sets for example May month's data from year 2016

2.After extracting May month's  data from all data sets then need to do join operation(take required columns)

3.Load it into target table for only may month's data

 

Proceed above steps same for  month june till ... max(month)(19Apr2019) and load it 

 

9 REPLIES 9
Patrick
Opal | Level 21

@JJP1 

I don't fully get it what you have and what you need so let me make a few assumptions and then throw some ideas.

 

1. You have source tables with monthly data - something like month end account balances (1 row per account and month, but multiple months in the table).

2. You want to load some SCD type target table - so you need to load month by month.

3. This is a one off task. Going forward you'll just have a monthly process where you just need to pick the latest month.

 

If above is true:

I'd start with implementing the inner job. I'd define a job parameter (a prompt) with a default value of a date which I know exists in the source tables. Then implement all the code required to load this one month (using the parameter macro variable in your source data selections). 

Once that works I'd change the default parameter to the next higher month's date. Then execute my job again. If this works and the data in target is as expected then my first round of unit testing for this inner job is done.

 

So now to the outer (loop) job: What you need to do here is implement logic which builds up the control table for the loop transformation (creating 1 row per month). I guess you just need to create lists of distinct month begin dates from your source tables (use intnx() to align source dates if you're dealing with a mixture of month begin and month end dates) to work out for which dates you've got all the data, then sort the dates an populate the loop control table. Define the loop to NOT execute in parallel so that the inner job then gets called in sequence month by month.

 

...and for ongoing processing: You should be able to use the inner job on its own with little to no change. You basically would just need to add a bit of logic at the start of the job (i.e. in the job pre-code section) which checks if the inner job has been called via the loop transformation or on its own. To do so you could look for DIS generated macro variables which you can use as indicators; else: once the first round of inner job unit testing is done set the default of the date parameter to missing and use this as indicator  =set the date parameter to current month -1 if missing.

JJP1
Pyrite | Level 9

Hi @Patrick ,

 

Iam using below code in outer loop job in user written node 

 

PROC SQL;
create table &_output. as
SELECT distinct year(Date) as year, month(Date) as month
FROM &SYSLAST. ;
QUIT;

 

from above code i am getting output as below

 

year month

2018 May

2018 NOV

2018 OCT

2019 JAN

 

based on the above output every time loop transformation will do looping on 2018 May and load the data to final target table.and next on 2018 Nov ....like this

This is currently happening in job.

but i want to do the looping over 2018 year ,May month and Nov month,Oct month(only 3 months to be taken in each loop operation and load the data to final table please ).

 

Would you please suggest is there any option we can do this please.Thanks

Patrick
Opal | Level 21

@JJP1 

The loop transformation will execute the inner job exactly once per row in the loop control table (your &_output table I assume). For the example you've provided the loop will execute the inner job exactly 4 times.

 

How you define and populate the control table and what the inner job then does with the parameters you pass-in is totally up to you.

 

"but i want to do the looping over 2018 year ,May month and Nov month,Oct month(only 3 months to be taken in each loop operation and load the data to final table please )."

Sorry, but I don't understand your explanation. Where do these 3 months come from? Do you want a control table with start months (i.e. quarter) and then the inner loop to execute data quarter by quarter (=you would need to fix how you populate your control table)?

Happy to help but I'd need you to try and formulate what you have and what you want once more in different words to help me understand the problem.

JJP1
Pyrite | Level 9

Hi @Patrick ,

 

Basically we are  creating one-off job where finally  SAS dataset will be going to have some millions of records.

 

  • To do this we have chosen loop approach as it can take chunks of data,so that we will not be run out of WORK space 
  • created inner loop job (Parameter job) : as part of this we are creating  2  prompts (year,month) as numeric values and doing some joining finally using (table loader transformation :append to existing option(we need the data to be appended ) while loading into SAS final dataset)
  • created one-off job(outer loop job) by following below steps

            1. dragged and dropped the &SYSLAST table node into job area

             2.written below mentioned code in user-written node transformation where we will get distinct year and month

 

PROC SQL;
create table &_output. as
SELECT distinct year(Created_Date) as year, month(Created_Date) as month
FROM &SYSLAST. ;
QUIT;

and above output node will have only two columns (year and month )

 

and output will look similar as below 

year month

2018 11

2018 10

2019 01

2.next Loop transformation is used (done the parameter mapping tab as year and month as mapped source column)

3.loop job added

4.loop end transformation added.please find the below screen shot and kindly ignore the error in screen shot please.

 

Screen Shot :

error.PNG

Patrik,currently this job will work as loop on year -2018 and month-Nov and load the final dataset and then take another  record 2018 Oct and load to final SAS table and it goes till ...

actually only one month per each year we are looping over  loading the data to dataset right now,

is it any other option that we can load instead of one month per each year,2 or 3 months like -loop over 2018 year under october and november months data to be loaded to final SAS table please to increase the efficiency.

 

please let me know if you need any clarification or any thing which i am not clear please.kindly suggest if you have any better approach which you feel please

 

 

Patrick
Opal | Level 21

@JJP1 

Thanks for the details. This makes things much clearer. 

 

If you implement logic for your inner job so that it can pick up 3 months per iteration then sure you can do it. You then just would need to generate a control table for the loop which provides the start year/date and you then have logic in your inner job which picks up the month/year you pass in as parameter plus the two following months - and of course you populate your control table with dates in 3 month steps.

...or you amend your selection logic in the inner job to work with date ranges - i.e. date between &startDT and &endDT - and you then amend your control table and the prompts for the inner job so that you can populate and pass such values.

 

Having said that:

If there is any chance that you can use the exactly same inner job for on-going monthly processing with almost no extra logic for the one off situation then I would prioritize re-usability over one-off job performance. So in such a case I'd go for monthly loops even if this takes a bit longer for the initial load. Using the same inner job also for ongoing processing would reduce implementation and testing effort and - sort of - add an additional round of "testing" before moving into BAU mode as the inner job will already get executed many times as part of the one off initial load. 

JJP1
Pyrite | Level 9

Hi @Patrick ,

 

Thanks for the reply,would you please clarify on below points.

 

 

1.You mean to say you would chose rather that using loop option for  3 months under each year it is better to use only  1 month per each year please ?

 

2.Also sorry to trouble you,but i am not really understood where and how we can modify the existing loop job so that it will take 3months at a time while loading

Patrick
Opal | Level 21

@JJP1 

About 1) 

This is about cost and effort. If you can use a job for both initial load and ongoing BAU processing then that's certainly more cost effective as you don't have to design, develop and test two different jobs

 

About 2)

I don't know what your inner job does. All I'm saying is: Your inner job uses some SAS macro variables which get populated via the loop transformation and what's in the loop control table. Because you've got everything under control you can of course write code in your inner job which can process 3 months worth of data at a time - and you then can create and populate a control table which passes in values to the inner job as it requires it for such 3 months at a time processing.

JJP1
Pyrite | Level 9

Hi @Patrick ,

 

Would you please help on code to write in job which can process 3 months of data in each year if possible please 

Patrick
Opal | Level 21

@JJP1 

How would I be able to do that? I don't know anything about your source and target data.

 

On a general level: That's just about formulating your SQL where clause and/or join condition in a way that it selects 3 months.

 

It's getting late where I live so putting my computer to sleep for now.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1786 views
  • 0 likes
  • 2 in conversation