Solved: Re: Is there a way to loop and flag it? or compare and flag it? - Page 2

Patrick · Posted 06-02-2018 12:35 AM

Having more than 1 record for the same staff on the same date with the same expense type is a new use case.

The purpose of i.date ne o.date is to not match a record with itself as else all records will get flagged with 'Y' as you've observed.

If we can't use the date for such an exclusion then we need another column (or columns) which uniquely identify a record. I can's see anything suitable in the data you've posted but: Do you have any other column in source which you could map for this purpose - for example a transaction key or a record id?

If yes then map this column and replace i.date ne o.date with i.<key column> ne o.<key column>

IF you can take above approach then you still can implement the process in DIS without any user written code - which is the best practice as then data lineage and impact analysis is fully maintained.

imdickson · Posted 06-02-2018 02:52 AM

Hi @Patrick

Fully understood. Here are the 4 columns that we can include together to identify the distinct rows:

LocCurrAmt + Doc + DocumentNumber + Description

I have modifed the SQL code to these (But I need your advice to make it work)

proc sql;
  create table &_OUTPUT1 as
  select 
    o.*,
    ( select distinct 'Y' 
      from &_INPUT1 as i 
      where 
        i.StaffNo=o.Staffno
        and i.ExpTy=o.ExpTy
        and i.date ne o.date
        and i.Date between o.Date+30 and o.Date-30
	and i.ExpTy IN (6039,6081,6070)
	and (i.LocCurrAmt ne o.LocCurrAmt
	and i.DocumentNumber ne o.DocumentNumber
	and i.Description ne o.Description
	and i.Doc ne o.Doc)
    ) 
    as flag length=1
  from &_INPUT1 as o
  ;
quit;

Currently I am still getting the same StaffID + ExpTy + Date record that doesnt flag as Y by using command above.

Seeking for your kind advice again Patrick. I think we are very close to there.

EDIT V2:

I have also tried with below SQL Command:

proc sql;
  create table &_OUTPUT1 as
  select 
    o.*,
    ( select distinct 'Y' 
      from &_INPUT1 as i 
      where 
        i.StaffNo=o.Staffno
        and i.ExpTy=o.ExpTy
        and i.Date between o.Date+30 and o.Date-30
			and i.ExpTy IN (6039,6081,6070)
			and (i.LocCurrAmt ne o.LocCurrAmt
                      and i.date ne o.date
			and i.DocumentNumber ne o.DocumentNumber
			and i.Description ne o.Description
			and i.Doc ne o.Doc)
    ) 
    as flag length=1
  from &_INPUT1 as o
  ;
quit;

It gives me this result set 😞

Patrick · Posted 06-02-2018 03:06 AM

@imdickson

Remove: and i.date ne o.date

Not sure about condition: and i.ExpTy IN (6039,6081,6070)

In case you want the whole logic for the Flag column only to be applied on above condition then eventually you will have to extend the condition to: and i.ExpTy IN (6039,6081,6070) and o.ExpTy IN (6039,6081,6070)

...and as long as you've got only AND in your where clause you don't need to bother about brackets.

imdickson · Posted 06-02-2018 03:22 AM

Hi Patrick.

Thanks for the excellent advice. I removed the date = date and now the result sets are MORE ACCURATE.

However, i still can see data that are in same Date are still not getting flagged as Y.

You may have a look at result below:

As you can see, row 88 to 91 are accurate. So is 93 & 94. However, for Row 95 to 99, it is not flagged as Y.

Please have a look at my SQL below again:

proc sql;
  create table &_OUTPUT1 as
  select 
    o.*,
    ( select distinct 'Y' 
      from &_INPUT1 as i 
      where 
        i.StaffNo=o.Staffno
        and i.ExpTy=o.ExpTy
        and i.Date between o.Date+30 and o.Date-30
			and i.ExpTy IN (6039,6081,6070)
			and o.ExpTy IN (6039,6081,6070)
			and (i.LocCurrAmt ne o.LocCurrAmt
			and i.DocumentNumber ne o.DocumentNumber
			and i.Description ne o.Description
			and i.Doc ne o.Doc)
    ) 
    as flag length=1
  from &_INPUT1 as o
  ;
quit;

Do you have any clue on why row 95 to 99 are not flagged as Y based on the SQL command above?

And i just want these logic apply to ExpenseTy(ExpTy) to that 3 category as per those in SQL command.

Patrick · Posted 06-02-2018 03:54 AM

@imdickson

You've defined the record identifier as: LocCurrAmt + Doc + DocumentNumber + Description

That means we only exclude a record from matching if ALL columns are the same.

In your SQL you've got:

and (i.LocCurrAmt ne o.LocCurrAmt OR ... OR ... i.Doc ne o.Doc)
and i.DocumentNumber ne o.DocumentNumber
and i.Description ne o.Description
and i.Doc ne o.Doc)

That means we already exclude a record from matching as soon as at least ONE column has the same value.

In your case DOC has always a value of 1 and that's why your logic believes that all these records are the same record.

For what you want to achieve the logic needs to be:

Option 1:
and (i.LocCurrAmt ne o.LocCurrAmt OR ... OR ... i.Doc ne o.Doc)

Option 2:
and NOT
  (
    i.LocCurrAmt = o.LocCurrAmt
    and i.DocumentNumber = o.DocumentNumber
    and i.Description = o.Description
    and i.Doc = o.Doc
  )

Peoples brains work differently. For me option 2 was always easier to understand.

And last but not least: Looking into your data I believe columns {DOC, DocumentNumber} would be sufficient as record identifiers. If so the your SQL could look like:

proc sql;
  create table &_OUTPUT1 as
  select 
    o.*,
    ( select distinct 'Y' 
      from &_INPUT1 as i 
      where 
        i.StaffNo=o.Staffno
        and i.ExpTy=o.ExpTy
        and i.Date between o.Date+30 and o.Date-30
        and i.ExpTy IN (6039,6081,6070)
        and NOT
        (
         i.DocumentNumber = o.DocumentNumber
         and i.Doc = o.Doc
        )
    ) 
    as flag length=1
  from &_INPUT1 as o
  ;
quit;

BTW: I've also removed and o.ExpTy IN (6039,6081,6070) because looking into the where clause again I've realized that the combination of below already covers this filter condition.

i.ExpTy=o.ExpTy
and i.ExpTy IN (6039,6081,6070)

imdickson · Posted 06-02-2018 04:12 AM

Awesome!

That works 100% perfectly fine @Patrick.

I really appreciate your effort in helping me all this while. I shall buy you a coffee or something if i ever met you.

Once again thanks a lot Patrick!

And i thank everyone that gave me so much feedback and advice with sample codes to help. You guys are awesome.!

Patrick · Posted 06-02-2018 05:17 AM

@imdickson

And now you can implement all of this using the SQL Join Transformation so it's fully SAS Metadata driven.

mkeintz · Posted 06-01-2018 11:36 PM

This program outputs both dates for any interval <=7 days for the same staffid/expensetype:

proc sort data=have out=need;
  by staffno expensetype date;
run;

data want (drop=nxt_date);
  set need (drop=date);
  by staffno expensetype;

  merge need (keep=date)
        need (firstobs=2 keep=date rename=(date=nxt_date));
  if (last.expensetype=0 and nxt_date-date <=7) or
     (first.expensetype=0 and dif(date)<=7);
run;

The downside is if you have 3 dates each within 7 days of its neighbor, the middle record is output twice. That can be fixed by following the above by a proc sort with a nodupkey option, and it puts the data into chronological order:

proc sort data=want nodupkey;
  by staffno date expensetype;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away