SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Need Asistance DI Studio: Max Effective Date Subquery in Join Transformation.

Reply
Occasional Contributor
Posts: 7

Need Asistance DI Studio: Max Effective Date Subquery in Join Transformation.

Hello Community, 

I have hit a wall trying to implement something i do all day long in EG. 

We are using DI studio 4.901. I am attempting to build a job that requires the data to be max effectie dated before loading to its final destination.

 

I am lost on how to make this happen in the subquery transformation. I have attempted multiple ways and am unable to make it happen sucessfully.

 

Any assitance on simply where i need to put the Subquery code and build the job links would be greatly apreciated.

Respected Advisor
Posts: 4,130

Re: Need Asistance DI Studio: Max Effective Date Subquery in Join Transformation.

@smilbuta

The SQL JOIN transformation allows you to define sub-queries.

 

What are you trying to do? Is this about loading into a table using SCD2? 

 

And alternative to a sub-query can be to define first a SQL view as a "green table" (using the SQL JOIN or SQL EXTRACT transformation) and then join this view back to your source table using a 2nd SQL JOIN transformation.

Occasional Contributor
Posts: 7

Re: Need Asistance DI Studio: Max Effective Date Subquery in Join Transformation.

[ Edited ]

Hi Patrick, 

 

I was able to Sucsessfuly do wat i wanted to do in the SQL Execute node via user written code. But i was trying to be more savy and learn how to do it properly in the Join step as many have advised to do.

 

I am simply missing something with regards to how to build the join with a Subquery to do this.

 

To bad i cant upload a screen grab of my job. 

 

My currently working code is as follows:

SOURCETABLE -> EXTRACT -> SQL Execute(max effecive date) -> Other Transformations until Table loader.

 

What i was trying to do:

SOURCETABLE -> EXTRACT -> SUBQUERY (with max effective date) -> JOIN -> Other Transformations until Table loader.

 

I could not figure out how to build the the subquery correctly to interface with the Join. Im missing some key element here.

 

 

 

Respected Advisor
Posts: 4,130

Re: Need Asistance DI Studio: Max Effective Date Subquery in Join Transformation.

@smilbuta

Share your user written code. This will explain to us the logic you want to implement.

 

You can add screen shots via: 

1. Take screenshot/snip and store it

2. Use the "Photos" icon here and upload the screenshot to the forum.

 

You can also add attachments via the icon next to "Photos". 

 

I haven't seen your code yet but from what you describe a simple group by statement using the SQL JOIN transformation could eventually already do the job.

ata source;
  do group='A','B','C';
    do id=1 to 5;
      format effective_date date9.;
      effective_date=today()-ceil(ranuni(1)*100);
      output;
    end;
  end;
run;

proc sql;
  create table want as
    select 
      *,
      max(effective_date) as max_grouup_effective_date format=date9.
    from source
    group by group
    order by group, id
    ;
quit;
Highlighted
Respected Advisor
Posts: 4,130

Re: Need Asistance DI Studio: Max Effective Date Subquery in Join Transformation.

[ Edited ]

@smilbuta

To generate code in DIS as below:

proc sql;
   create table work.Target as
   select
      source_for_subselect.Id length = 8,
      source_for_subselect.Group length = 8,
      source_for_subselect.effective_date length = 8   
         format = date9.,
      max_group_effective_date length = 8   
         format = date9.
   from
      tests.source_for_subselect as source_for_subselect, 
      (
         select
            source_for_subselect_i.Group length = 8,
            source_for_subselect_i.effective_date as max_group_effective_date length = 8   
               format = date9.
         from
            tests.source_for_subselect as source_for_subselect_i
         group by
            source_for_subselect_i.Group
      )
   ;
quit;

Your flow needs to look like:

Capture.JPG

Capture.JPG

Capture.JPG

 

 

I personally find sub-selects in DIS too labor intensive to implement and even more importantly I don't like how deep the logic gets hidden in the transformation and that I can't see "everything" visually at once.

For this reason I normally don't implement using sub-selects but I'm using two SQL joins and my job flow would look like:

Capture.JPG

The code generated by the Join node looks then like:

proc sql;
   create table work.Target as
   select
      source_for_subselect.Id length = 8,
      source_for_subselect.Group length = 8,
      source_for_subselect.effective_date length = 8   
         format = date9.,
      max_eff_dt.max_group_effective_date length = 8   
         format = date9.
   from
      tests.source_for_subselect as source_for_subselect, 
      work.max_eff_dt as max_eff_dt
   where
      source_for_subselect.Group = max_eff_dt.Group
   ;
quit;

I've created source work.max_eff_dt as a view with a definition of:

proc sql;
   create view work.max_eff_dt as
      select
         Group,
         (max(effective_date)) as max_group_effective_date length = 8
            format = date9.
   from &SYSLAST
   order by
      Group
   ;
quit;

Given that a view can be seen as encapsulated SQL code which only gets executed when used.... if you compare what the flow using a sub-select generates as code with what my 2nd approach creates then you'll see that this is basically the same code when it comes to execution time. 

I personally find the 2nd option easier to implement and much easier to "read" and maintain.

 

 

Ask a Question
Discussion stats
  • 4 replies
  • 137 views
  • 2 likes
  • 2 in conversation