Hello Community,
I have hit a wall trying to implement something i do all day long in EG.
We are using DI studio 4.901. I am attempting to build a job that requires the data to be max effectie dated before loading to its final destination.
I am lost on how to make this happen in the subquery transformation. I have attempted multiple ways and am unable to make it happen sucessfully.
Any assitance on simply where i need to put the Subquery code and build the job links would be greatly apreciated.
The SQL JOIN transformation allows you to define sub-queries.
What are you trying to do? Is this about loading into a table using SCD2?
And alternative to a sub-query can be to define first a SQL view as a "green table" (using the SQL JOIN or SQL EXTRACT transformation) and then join this view back to your source table using a 2nd SQL JOIN transformation.
Hi Patrick,
I was able to Sucsessfuly do wat i wanted to do in the SQL Execute node via user written code. But i was trying to be more savy and learn how to do it properly in the Join step as many have advised to do.
I am simply missing something with regards to how to build the join with a Subquery to do this.
To bad i cant upload a screen grab of my job.
My currently working code is as follows:
SOURCETABLE -> EXTRACT -> SQL Execute(max effecive date) -> Other Transformations until Table loader.
What i was trying to do:
SOURCETABLE -> EXTRACT -> SUBQUERY (with max effective date) -> JOIN -> Other Transformations until Table loader.
I could not figure out how to build the the subquery correctly to interface with the Join. Im missing some key element here.
Share your user written code. This will explain to us the logic you want to implement.
You can add screen shots via:
1. Take screenshot/snip and store it
2. Use the "Photos" icon here and upload the screenshot to the forum.
You can also add attachments via the icon next to "Photos".
I haven't seen your code yet but from what you describe a simple group by statement using the SQL JOIN transformation could eventually already do the job.
ata source;
do group='A','B','C';
do id=1 to 5;
format effective_date date9.;
effective_date=today()-ceil(ranuni(1)*100);
output;
end;
end;
run;
proc sql;
create table want as
select
*,
max(effective_date) as max_grouup_effective_date format=date9.
from source
group by group
order by group, id
;
quit;
To generate code in DIS as below:
proc sql;
create table work.Target as
select
source_for_subselect.Id length = 8,
source_for_subselect.Group length = 8,
source_for_subselect.effective_date length = 8
format = date9.,
max_group_effective_date length = 8
format = date9.
from
tests.source_for_subselect as source_for_subselect,
(
select
source_for_subselect_i.Group length = 8,
source_for_subselect_i.effective_date as max_group_effective_date length = 8
format = date9.
from
tests.source_for_subselect as source_for_subselect_i
group by
source_for_subselect_i.Group
)
;
quit;
Your flow needs to look like:
I personally find sub-selects in DIS too labor intensive to implement and even more importantly I don't like how deep the logic gets hidden in the transformation and that I can't see "everything" visually at once.
For this reason I normally don't implement using sub-selects but I'm using two SQL joins and my job flow would look like:
The code generated by the Join node looks then like:
proc sql;
create table work.Target as
select
source_for_subselect.Id length = 8,
source_for_subselect.Group length = 8,
source_for_subselect.effective_date length = 8
format = date9.,
max_eff_dt.max_group_effective_date length = 8
format = date9.
from
tests.source_for_subselect as source_for_subselect,
work.max_eff_dt as max_eff_dt
where
source_for_subselect.Group = max_eff_dt.Group
;
quit;
I've created source work.max_eff_dt as a view with a definition of:
proc sql;
create view work.max_eff_dt as
select
Group,
(max(effective_date)) as max_group_effective_date length = 8
format = date9.
from &SYSLAST
order by
Group
;
quit;
Given that a view can be seen as encapsulated SQL code which only gets executed when used.... if you compare what the flow using a sub-select generates as code with what my 2nd approach creates then you'll see that this is basically the same code when it comes to execution time.
I personally find the 2nd option easier to implement and much easier to "read" and maintain.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.