10-05-2017 12:12 PM
I have hit a wall trying to implement something i do all day long in EG.
We are using DI studio 4.901. I am attempting to build a job that requires the data to be max effectie dated before loading to its final destination.
I am lost on how to make this happen in the subquery transformation. I have attempted multiple ways and am unable to make it happen sucessfully.
Any assitance on simply where i need to put the Subquery code and build the job links would be greatly apreciated.
10-05-2017 02:59 PM
The SQL JOIN transformation allows you to define sub-queries.
What are you trying to do? Is this about loading into a table using SCD2?
And alternative to a sub-query can be to define first a SQL view as a "green table" (using the SQL JOIN or SQL EXTRACT transformation) and then join this view back to your source table using a 2nd SQL JOIN transformation.
10-05-2017 03:53 PM - edited 10-05-2017 03:53 PM
I was able to Sucsessfuly do wat i wanted to do in the SQL Execute node via user written code. But i was trying to be more savy and learn how to do it properly in the Join step as many have advised to do.
I am simply missing something with regards to how to build the join with a Subquery to do this.
To bad i cant upload a screen grab of my job.
My currently working code is as follows:
SOURCETABLE -> EXTRACT -> SQL Execute(max effecive date) -> Other Transformations until Table loader.
What i was trying to do:
SOURCETABLE -> EXTRACT -> SUBQUERY (with max effective date) -> JOIN -> Other Transformations until Table loader.
I could not figure out how to build the the subquery correctly to interface with the Join. Im missing some key element here.
10-06-2017 04:37 AM
Share your user written code. This will explain to us the logic you want to implement.
You can add screen shots via:
1. Take screenshot/snip and store it
2. Use the "Photos" icon here and upload the screenshot to the forum.
You can also add attachments via the icon next to "Photos".
I haven't seen your code yet but from what you describe a simple group by statement using the SQL JOIN transformation could eventually already do the job.
ata source; do group='A','B','C'; do id=1 to 5; format effective_date date9.; effective_date=today()-ceil(ranuni(1)*100); output; end; end; run; proc sql; create table want as select *, max(effective_date) as max_grouup_effective_date format=date9. from source group by group order by group, id ; quit;
10-06-2017 04:54 AM - edited 10-08-2017 07:23 PM
To generate code in DIS as below:
proc sql; create table work.Target as select source_for_subselect.Id length = 8, source_for_subselect.Group length = 8, source_for_subselect.effective_date length = 8 format = date9., max_group_effective_date length = 8 format = date9. from tests.source_for_subselect as source_for_subselect, ( select source_for_subselect_i.Group length = 8, source_for_subselect_i.effective_date as max_group_effective_date length = 8 format = date9. from tests.source_for_subselect as source_for_subselect_i group by source_for_subselect_i.Group ) ; quit;
Your flow needs to look like:
I personally find sub-selects in DIS too labor intensive to implement and even more importantly I don't like how deep the logic gets hidden in the transformation and that I can't see "everything" visually at once.
For this reason I normally don't implement using sub-selects but I'm using two SQL joins and my job flow would look like:
The code generated by the Join node looks then like:
proc sql; create table work.Target as select source_for_subselect.Id length = 8, source_for_subselect.Group length = 8, source_for_subselect.effective_date length = 8 format = date9., max_eff_dt.max_group_effective_date length = 8 format = date9. from tests.source_for_subselect as source_for_subselect, work.max_eff_dt as max_eff_dt where source_for_subselect.Group = max_eff_dt.Group ; quit;
I've created source work.max_eff_dt as a view with a definition of:
proc sql; create view work.max_eff_dt as select Group, (max(effective_date)) as max_group_effective_date length = 8 format = date9. from &SYSLAST order by Group ; quit;
Given that a view can be seen as encapsulated SQL code which only gets executed when used.... if you compare what the flow using a sub-select generates as code with what my 2nd approach creates then you'll see that this is basically the same code when it comes to execution time.
I personally find the 2nd option easier to implement and much easier to "read" and maintain.