Identify same day duplicates & perform sum function

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 95
Accepted Solution

Identify same day duplicates & perform sum function

hi there,

I have a dataset where daily values indicate card credit line changes (increase or decrease) and account id. I want to eliminate instances where for the same account ID, the amount of the increase is equivalent to amount of decrease (or vice versa), on the same day. (Example: I request my credit line to be decreased by $1000, but then an hour later change my mind, and ask that it be brought back/increased by $1000.).

Could someone recommend base sas/sql code along the lines of: for each distinct day (group function?), identify duplicate values. For each pair/set of duplicates, sum their line change total, and if 0, then mark as 'false positive'; otherwise, leave as is. 

 

Appreciate any tips you may have.


Accepted Solutions
Solution
‎05-06-2016 10:29 AM
Super User
Posts: 11,343

Re: Identify same day duplicates & perform sum function

You don't mention what to do if there are multiple requests for increases.and decreases ie. add then change mind, then change mind and add the increase again (continue until the data base is full...) or request 1000 addition and then add another 1000 ...

 

Proc summary data=have nway;

   class accountid date;

    var Amountchange;

    output out = summary (drop=_type_) sum =;

run;

 

Willl have the account, date, a variable _freq_ that says how many values were there for the day and the sum of the changes as Amountchange.

You could then filter out records you want based on your criteria to assign flag values with a data step.

View solution in original post


All Replies
Solution
‎05-06-2016 10:29 AM
Super User
Posts: 11,343

Re: Identify same day duplicates & perform sum function

You don't mention what to do if there are multiple requests for increases.and decreases ie. add then change mind, then change mind and add the increase again (continue until the data base is full...) or request 1000 addition and then add another 1000 ...

 

Proc summary data=have nway;

   class accountid date;

    var Amountchange;

    output out = summary (drop=_type_) sum =;

run;

 

Willl have the account, date, a variable _freq_ that says how many values were there for the day and the sum of the changes as Amountchange.

You could then filter out records you want based on your criteria to assign flag values with a data step.

Frequent Contributor
Posts: 95

Re: Identify same day duplicates

Thanks ballardw for your review. That's a good question. Irrespective of volume of duplicates, same day, add the total, and if zero, then mark as false_positive (or delete); else, leave as is.

Example:

Account ID

date

Line Change Amount

001

01/01/2010

500

001

01/01/2010

-500

002

01/01/2010

100

002

01/01/2010

100

002

01/01/2010

-100

003

01/01/2010

200

004

01/02/2010

300



Result:
Row

Account ID

date

Line Change Amount

Outcome

1

001

01/01/2010

500

False Positive

2

001

01/01/2010

-500

False Positive

3

002

01/01/2010

100



4

002

01/01/2010

100



5

002

01/01/2010

-100



6

003

01/01/2010

200



7

004

01/02/2010

300




In the results table, no changes to row 3 to 7. Row 1 and 2 sum is zero, so cancel out.
Thanks

This communication is for informational purposes only. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase & Co., its subsidiaries and affiliates (collectively, "JPMC").

This transmission may contain information that is proprietary, privileged, confidential and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMC for any loss or damage arising in any way from its use. Please note that any electronic communication that is conducted within or through JPMC's systems is subject to interception, monitoring, review, retention and external production in accordance with JPMC's policy and local laws, rules and regulations; may be stored or otherwise processed in countries other than the country in which you are located; and will be treated in accordance with JPMC policies and applicable laws and regulations.

Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to European legal entities.
Frequent Contributor
Posts: 95

Re: Identify same day duplicates & perform sum function

hi ballardw, your solution works, thanks a lot
Respected Advisor
Posts: 4,925

Re: Identify same day duplicates & perform sum function

Please provide key input and output variable descriptions. 

PG
Super User
Posts: 5,509

Re: Identify same day duplicates & perform sum function

Assuming your data set is sorted by ACCOUNT DATE, this program would run:

 

data want;

changed=0;

do until (last.date);

   set have;

   by account date;

   changed + increase; 

end;

do until (last.date);

   set have;

   by account date;

   if changed then output;

end;

drop changed;

run;

 

A few notes about the results ...

 

This does not use pairs of records,  Rather, it uses all changes for the entire day, and sees if the total change comes out to zero.  That's a little bit different than what you asked for, but might be acceptable.

 

The logic assumes that a single variable INCREASE indicates the changes, and would hold a negative value for a decrease.

 

The condition IF CHANGED will take all non-zero values to be true.

 

The top loop sums up the changes, and the bottom loop runs through the same observations a second time (outputting if appropriate).

Frequent Contributor
Posts: 95

Re: Identify same day duplicates

Posted in reply to Astounding
Thanks Astounding for your proposed code, and will try it out and get back to you ... thanks again

This communication is for informational purposes only. It is not intended as an offer or solicitation for the purchase or sale of any financial instrument or as an official confirmation of any transaction. All market prices, data and other information are not warranted as to completeness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of JPMorgan Chase & Co., its subsidiaries and affiliates (collectively, "JPMC").

This transmission may contain information that is proprietary, privileged, confidential and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Although this transmission and any attachments are believed to be free of any virus or other defect that might affect any computer system into which it is received and opened, it is the responsibility of the recipient to ensure that it is virus free and no responsibility is accepted by JPMC for any loss or damage arising in any way from its use. Please note that any electronic communication that is conducted within or through JPMC's systems is subject to interception, monitoring, review, retention and external production in accordance with JPMC's policy and local laws, rules and regulations; may be stored or otherwise processed in countries other than the country in which you are located; and will be treated in accordance with JPMC policies and applicable laws and regulations.

Please refer to http://www.jpmorgan.com/pages/disclosures for disclosures relating to European legal entities.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 388 views
  • 0 likes
  • 4 in conversation