06-19-2018 06:32 AM
I have a use-case where table contains 1 or more rows for the same email address column. There is a columd "date".
From all rows with same email address, only the one with newest date should be taken in a result table, regardless of values of other columns (20-ish columns are in the table all together).
Is there a way to handle it in DI Studio without using custom written code?
06-19-2018 07:55 AM
I found this article that deals with a quite similar problem by using the Query Builder in Enterprise Guide. Maybe you can apply the method to DI Studio as well.
06-20-2018 12:36 AM
Below two coding options should both be quite simple to implement in DIS using standard transformations.
data have; set sashelp.class; email@example.com'; dt=today()+_n_; format dt date9.; run; /* option 1 */ proc sort data=have out=want1; by email_addr DESCENDING dt; run; proc sort data=want1 nodupkey; by email_addr; run; /* option 2 */ proc sql; create table want2 as select * from have group by email_addr having max(dt)=dt ; quit;
06-21-2018 07:21 PM
Personally, this is just a bit of "SQL maths", easily done with Extract nodes.
extract node to
select email, max(date) as maxdate from table group by email into work.interim
join that back onto the original table, joining on email = email and date = maxdate
All done in extract nodes.
Remember, DI Studio isn't a supposed to be programming tool.
06-22-2018 04:24 AM
Makes sense. Thanks!
In the meantime, I checked RANK transformation and that does the trick - gives me ranking within parameters I need. Then I just take out rows with rank=1 for example.
I completelly agree about DI Studio not being programming tool. It is possible to run all in user-written code but we are avoiding it by all means.
06-22-2018 10:59 PM - edited 06-22-2018 10:59 PM
Just to clarify: The two coding option I've posted weren't meant to be implemented as user written code but as logic using standard DIS transformations.
Option 1 can get implemented using two SORT transformations.
Option 2 can get implemented using a SQL Join transformation.
Option 1 would should also allow to easily collect the rejected records in a second table.