BookmarkSubscribeRSS Feed
smilbuta
Fluorite | Level 6

Hello Community, 

I have hit a wall trying to implement something i do all day long in EG. 

We are using DI studio 4.901. I am attempting to build a job that requires the data to be max effectie dated before loading to its final destination.

 

I am lost on how to make this happen in the subquery transformation. I have attempted multiple ways and am unable to make it happen sucessfully.

 

Any assitance on simply where i need to put the Subquery code and build the job links would be greatly apreciated.

4 REPLIES 4
Patrick
Opal | Level 21

@smilbuta

The SQL JOIN transformation allows you to define sub-queries.

 

What are you trying to do? Is this about loading into a table using SCD2? 

 

And alternative to a sub-query can be to define first a SQL view as a "green table" (using the SQL JOIN or SQL EXTRACT transformation) and then join this view back to your source table using a 2nd SQL JOIN transformation.

smilbuta
Fluorite | Level 6

Hi Patrick, 

 

I was able to Sucsessfuly do wat i wanted to do in the SQL Execute node via user written code. But i was trying to be more savy and learn how to do it properly in the Join step as many have advised to do.

 

I am simply missing something with regards to how to build the join with a Subquery to do this.

 

To bad i cant upload a screen grab of my job. 

 

My currently working code is as follows:

SOURCETABLE -> EXTRACT -> SQL Execute(max effecive date) -> Other Transformations until Table loader.

 

What i was trying to do:

SOURCETABLE -> EXTRACT -> SUBQUERY (with max effective date) -> JOIN -> Other Transformations until Table loader.

 

I could not figure out how to build the the subquery correctly to interface with the Join. Im missing some key element here.

 

 

 

Patrick
Opal | Level 21

@smilbuta

Share your user written code. This will explain to us the logic you want to implement.

 

You can add screen shots via: 

1. Take screenshot/snip and store it

2. Use the "Photos" icon here and upload the screenshot to the forum.

 

You can also add attachments via the icon next to "Photos". 

 

I haven't seen your code yet but from what you describe a simple group by statement using the SQL JOIN transformation could eventually already do the job.

ata source;
  do group='A','B','C';
    do id=1 to 5;
      format effective_date date9.;
      effective_date=today()-ceil(ranuni(1)*100);
      output;
    end;
  end;
run;

proc sql;
  create table want as
    select 
      *,
      max(effective_date) as max_grouup_effective_date format=date9.
    from source
    group by group
    order by group, id
    ;
quit;
Patrick
Opal | Level 21

@smilbuta

To generate code in DIS as below:

proc sql;
   create table work.Target as
   select
      source_for_subselect.Id length = 8,
      source_for_subselect.Group length = 8,
      source_for_subselect.effective_date length = 8   
         format = date9.,
      max_group_effective_date length = 8   
         format = date9.
   from
      tests.source_for_subselect as source_for_subselect, 
      (
         select
            source_for_subselect_i.Group length = 8,
            source_for_subselect_i.effective_date as max_group_effective_date length = 8   
               format = date9.
         from
            tests.source_for_subselect as source_for_subselect_i
         group by
            source_for_subselect_i.Group
      )
   ;
quit;

Your flow needs to look like:

Capture.JPG

Capture.JPG

Capture.JPG

 

 

I personally find sub-selects in DIS too labor intensive to implement and even more importantly I don't like how deep the logic gets hidden in the transformation and that I can't see "everything" visually at once.

For this reason I normally don't implement using sub-selects but I'm using two SQL joins and my job flow would look like:

Capture.JPG

The code generated by the Join node looks then like:

proc sql;
   create table work.Target as
   select
      source_for_subselect.Id length = 8,
      source_for_subselect.Group length = 8,
      source_for_subselect.effective_date length = 8   
         format = date9.,
      max_eff_dt.max_group_effective_date length = 8   
         format = date9.
   from
      tests.source_for_subselect as source_for_subselect, 
      work.max_eff_dt as max_eff_dt
   where
      source_for_subselect.Group = max_eff_dt.Group
   ;
quit;

I've created source work.max_eff_dt as a view with a definition of:

proc sql;
   create view work.max_eff_dt as
      select
         Group,
         (max(effective_date)) as max_group_effective_date length = 8
            format = date9.
   from &SYSLAST
   order by
      Group
   ;
quit;

Given that a view can be seen as encapsulated SQL code which only gets executed when used.... if you compare what the flow using a sub-select generates as code with what my 2nd approach creates then you'll see that this is basically the same code when it comes to execution time. 

I personally find the 2nd option easier to implement and much easier to "read" and maintain.

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1243 views
  • 2 likes
  • 2 in conversation