BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Patrick
Opal | Level 21

@imdickson

Having more than 1 record for the same staff on the same date with the same expense type is a new use case.

 

The purpose of i.date ne o.date is to not match a record with itself as else all records will get flagged with 'Y' as you've observed.

 

If we can't use the date for such an exclusion then we need another column (or columns) which uniquely identify a record. I can's see anything suitable in the data you've posted but: Do you have any other column in source which you could map for this purpose - for example a transaction key or a record id?

 

If yes then map this column and replace i.date ne o.date with i.<key column> ne o.<key column>

 

IF you can take above approach then you still can implement the process in DIS without any user written code - which is the best practice as then data lineage and impact analysis is fully maintained.

imdickson
Quartz | Level 8

Hi @Patrick

 

Fully understood. Here are the 4 columns that we can include together to identify the distinct rows:

LocCurrAmt + Doc + DocumentNumber + Description

 

I have modifed the SQL code to these (But I need your advice to make it work)

proc sql;
  create table &_OUTPUT1 as
  select 
    o.*,
    ( select distinct 'Y' 
      from &_INPUT1 as i 
      where 
        i.StaffNo=o.Staffno
        and i.ExpTy=o.ExpTy
        and i.date ne o.date
        and i.Date between o.Date+30 and o.Date-30
	and i.ExpTy IN (6039,6081,6070)
	and (i.LocCurrAmt ne o.LocCurrAmt
	and i.DocumentNumber ne o.DocumentNumber
	and i.Description ne o.Description
	and i.Doc ne o.Doc)
    ) 
    as flag length=1
  from &_INPUT1 as o
  ;
quit;

Currently I am still getting the same StaffID + ExpTy + Date record that doesnt flag as Y by using command above.

 

Seeking for your kind advice again Patrick. I think we are very close to there.

 

 

EDIT V2:

 

I have also tried with below SQL Command:

proc sql;
  create table &_OUTPUT1 as
  select 
    o.*,
    ( select distinct 'Y' 
      from &_INPUT1 as i 
      where 
        i.StaffNo=o.Staffno
        and i.ExpTy=o.ExpTy
        and i.Date between o.Date+30 and o.Date-30
			and i.ExpTy IN (6039,6081,6070)
			and (i.LocCurrAmt ne o.LocCurrAmt
                      and i.date ne o.date
			and i.DocumentNumber ne o.DocumentNumber
			and i.Description ne o.Description
			and i.Doc ne o.Doc)
    ) 
    as flag length=1
  from &_INPUT1 as o
  ;
quit;

 

It gives me this result set 😞

ExpTy question 3.PNG

Patrick
Opal | Level 21

@imdickson

Remove:  and i.date ne o.date

 

Not sure about condition:  and i.ExpTy IN (6039,6081,6070)

In case you want the whole logic for the Flag column only to be applied on above condition then eventually you will have to extend the condition to:  and i.ExpTy IN (6039,6081,6070) and o.ExpTy IN (6039,6081,6070)

 

...and as long as you've got only AND in your where clause you don't need to bother about brackets.

 

imdickson
Quartz | Level 8

Hi Patrick.

 

Thanks for the excellent advice. I removed the date = date and now the result sets are MORE ACCURATE.

However, i still can see data that are in same Date are still not getting flagged as Y.


You may have a look at result below:

ExpTy question 4.PNG

As you can see, row 88 to 91 are accurate. So is 93 & 94. However, for Row 95 to 99, it is not flagged as Y.

 

Please have a look at my SQL below again:

proc sql;
  create table &_OUTPUT1 as
  select 
    o.*,
    ( select distinct 'Y' 
      from &_INPUT1 as i 
      where 
        i.StaffNo=o.Staffno
        and i.ExpTy=o.ExpTy
        and i.Date between o.Date+30 and o.Date-30
			and i.ExpTy IN (6039,6081,6070)
			and o.ExpTy IN (6039,6081,6070)
			and (i.LocCurrAmt ne o.LocCurrAmt
			and i.DocumentNumber ne o.DocumentNumber
			and i.Description ne o.Description
			and i.Doc ne o.Doc)
    ) 
    as flag length=1
  from &_INPUT1 as o
  ;
quit;

Do you have any clue on why row 95 to 99 are not flagged as Y based on the SQL command above?

 

 

 

And i just want these logic apply to ExpenseTy(ExpTy) to that 3 category as per those in SQL command.

Patrick
Opal | Level 21

@imdickson

You've defined the record identifier as: LocCurrAmt + Doc + DocumentNumber + Description

 

That means we only exclude a record from matching if ALL columns are the same.

 

In your SQL you've got:

and (i.LocCurrAmt ne o.LocCurrAmt OR ... OR ... i.Doc ne o.Doc)
and i.DocumentNumber ne o.DocumentNumber
and i.Description ne o.Description
and i.Doc ne o.Doc)

That means we already exclude a record from matching as soon as at least ONE column has the same value.

In your case DOC has always a value of 1 and that's why your logic believes that all these records are the same record.

 

For what you want to achieve the logic needs to be:

Option 1:
and (i.LocCurrAmt ne o.LocCurrAmt OR ... OR ... i.Doc ne o.Doc)

Option 2:
and NOT
  (
    i.LocCurrAmt = o.LocCurrAmt
    and i.DocumentNumber = o.DocumentNumber
    and i.Description = o.Description
    and i.Doc = o.Doc
  )

Peoples brains work differently. For me option 2 was always easier to understand.

 

And last but not least: Looking into your data I believe columns {DOC, DocumentNumber} would be sufficient as record identifiers. If so the your SQL could look like:

proc sql;
  create table &_OUTPUT1 as
  select 
    o.*,
    ( select distinct 'Y' 
      from &_INPUT1 as i 
      where 
        i.StaffNo=o.Staffno
        and i.ExpTy=o.ExpTy
        and i.Date between o.Date+30 and o.Date-30
        and i.ExpTy IN (6039,6081,6070)
        and NOT
        (
i.DocumentNumber = o.DocumentNumber
and i.Doc = o.Doc ) ) as flag length=1 from &_INPUT1 as o ; quit;

BTW: I've also removed and o.ExpTy IN (6039,6081,6070) because looking into the where clause again I've realized that the combination of below already covers this filter condition.

i.ExpTy=o.ExpTy
and i.ExpTy IN (6039,6081,6070)

 

imdickson
Quartz | Level 8

Awesome!

 

That works 100% perfectly fine @Patrick.

 

I really appreciate your effort in helping me all this while. I shall buy you a coffee or something if i ever met you.

 

Once again thanks a lot Patrick!

 

 

And i thank everyone that gave me so much feedback and advice with sample codes to help. You guys are awesome.!

Patrick
Opal | Level 21

@imdickson

Smiley Happy  And now you can implement all of this using the SQL Join Transformation so it's fully SAS Metadata driven.

mkeintz
PROC Star

This program outputs both dates for any interval <=7 days for the same staffid/expensetype:

 

proc sort data=have out=need;
  by staffno expensetype date;
run;

data want (drop=nxt_date);
  set need (drop=date);
  by staffno expensetype;

  merge need (keep=date)
        need (firstobs=2 keep=date rename=(date=nxt_date));
  if (last.expensetype=0 and nxt_date-date <=7) or
     (first.expensetype=0 and dif(date)<=7);
run;

The downside is if you have 3 dates each within 7 days of its neighbor, the middle record is output twice.  That can be fixed by following the above by a proc sort with a nodupkey option, and it puts the data into chronological order:

 

proc sort data=want nodupkey;
  by staffno date expensetype;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 1043 views
  • 0 likes
  • 7 in conversation