Solved: Setting flag=1 when a row exists in target but no longer in source

EinarRoed · Posted 02-07-2018 02:54 AM

Hello,

We have a table that's updated daily. It's updated by means of SCD2 without 'close out records'. This means that when a row is deleted from the source table, it remains as-is in the target table.

Now we'd like to add a new column named DELETED_IN_SOURCE_FLG. It should be set to '1' if the row no longer exists in the source table.

Example: A source table named SRC_EMAIL, and a target table named EDS_EMAIL, with the primary key EMAIL_ID.

What would be the most efficient way to say "if an EMAIL_ID exists in EDS_EMAIL, but no longer exists in SRC_EMAIL, then set DELETED_IN_SOURCE_FLG = 1"?

The only way I can think of at the moment sounds a bit cumbersome. Select all distinct EMAIL_ID from the source table, and from the target table, then join them and flag the ones that don't have a match, and write that to the target table. Is there a better way?

ballardw · Posted 02-07-2018 04:17 PM

This should identify the primary_keys involved if I understand.

proc sql;
   select email_id from  EDS_email
   except
   select email_id from SRC_email
   ;
run;

View solution in original post

EinarRoed · Posted 02-07-2018 03:54 AM

I almost never write SAS base, but here's what I've got so far. That gives me the EPOST_ID which exists in the target but no longer in the source. Is there a way to optimize it or improve the standard of the code?

/* ESTABLISH DATASETS*/
data SRC;
	set &_input;
run;
data EDS;
	set EDS.EMAIL;
run;

/*SELECT DISTINCT VALUES*/
proc sort data = SRC nodupkey;
	by EMAIL_ID;
run;
proc sort data = EDS nodupkey;
	by EMAIL_ID;
run;

/*DETERMINE EPOST_ID TO BE DELETED*/
proc sql;
	create table &_output as
	select EDS.EMAIL_ID from EDS
	left join SRC on EDS.EMAIL_ID = SRC.EMAIL_ID
	where SRC.EMAIL_ID is missing;
quit;

Patrick · Posted 02-07-2018 04:14 AM

@EinarRoed

You don't need to pre-sort data when using SQL.

Below how you would do this normally when using SQL (not only in SAS)

data source;
  set sashelp.class;
  if _n_=2 or _n_=5 then delete;
run;

data target;
  set sashelp.class;
run;

proc sql;
  create table deleted as
    select *
    from target t
    where not exists
      (select * from source s where s.name=t.name)
    ;
quit;

Patrick · Posted 02-07-2018 04:05 AM

@EinarRoed

You shouldn't need a deleted flag column in a proper SCD2 table. You simply expire the record.
From what you write (the 'close out records' bit) I assume that you're using DI Studio. So here IF you're doing a full load then just start using the 'close out records', if you're doing a delta load then you also must somewhere have/derive the information which columns got deleted since the last SCD2 load. If so then run a post process which uses a SQL update and expire all records which got deleted in source.
Not using a deleted flag column but just expiring records then also allows to have the SCD2 loader do all the work when it comes to re-opening records (=records which had been expired but which now re-appear in the source table)

EinarRoed · Posted 02-07-2018 04:16 AM

Thanks for the feedback @Patrick. You're correct that I'm using DI Studio. However we don't want to use 'close out rows', because that would set another flag (CURRENT_FLG) to '0' for those rows. For reporting purposes, those rows are still considered current. One of the analysts wants us to mark the rows which no longer exists in the source table, but it shouldn't interfere with our current reporting of all rows that have CURRENT_FLG = 1.

Patrick · Posted 02-07-2018 04:18 AM

@EinarRoed

Are you sure? I was in the believe the current_ind column (or however you name it) is always set to 1 for the most current record whether expired or not.

EinarRoed · Posted 02-07-2018 04:21 AM

That's how it works when not using 'close out records'. But if I enable 'close out records' then all the records that no longer exist in the source are closed (current_ind=0).

Patrick · Posted 02-07-2018 04:47 AM

@EinarRoed

Hmmm... when I'm choosing "close out records not in source table" then DIS generates the following code bit:

      /* Close out any records that are in the xref table but not in the source.  */ 
      else if inXref and not inSort then
         do;
         
            ETLS_CLSDATE = DATETIME();
            
            output work.etls_close;
            
         end;

There is nothing in it that changes the change current indicator so it should remain 'Y' for expired records. Are you really sure that your DIS job behaves differently? If so then please share the generated code.

EinarRoed · Posted 02-07-2018 05:04 AM

The purpose of using "close out records" is to close the row though, i.e. to set it as no longer being current.

Our table which uses "close out records" doesn't contain any closed rows with CURRENT_IND=1.

The DI Studio help file confirms the behaviour:

Specify Yes to close out target rows that are not included in the source table. To close out a target row when datetime change tracking is specified, the expression in the Change Tracking tab generates a new value for the end Date/Time column. To close out a row when the current row indicator change tracking method is specified, a value of zero is written into the current row indicator column.

Patrick · Posted 02-07-2018 05:10 AM

@EinarRoed

You're right. I've just tested it. The change current ind gets always set to 0 or N if a record gets expired. That's disappointing because that's this way simply redundant information.

So what you could do is implement a post process which updates the change current indicator to 1 / Y for the latest change end date per business key.

ballardw · Posted 02-07-2018 04:17 PM

This should identify the primary_keys involved if I understand.

proc sql;
   select email_id from  EDS_email
   except
   select email_id from SRC_email
   ;
run;

Setting flag=1 when a row exists in target but no longer in source

Re: Setting flag=1 when a row exists in target but no longer in source

Re: Setting flag=1 when a row exists in target but no longer in source

Re: Setting flag=1 when a row exists in target but no longer in source

Re: Setting flag=1 when a row exists in target but no longer in source

Re: Setting flag=1 when a row exists in target but no longer in source

Re: Setting flag=1 when a row exists in target but no longer in source

Re: Setting flag=1 when a row exists in target but no longer in source

Re: Setting flag=1 when a row exists in target but no longer in source

Re: Setting flag=1 when a row exists in target but no longer in source

Re: Setting flag=1 when a row exists in target but no longer in source

Re: Setting flag=1 when a row exists in target but no longer in source

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away