- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Update: selection solution actually doesn't work well in full data. See thread. Sorry for being hasty to accept a response!
I am working with some payment data that has several variables in it (ID is the customer ID, date is the date of service, and payment is how much the customer paid). However, some of the payment data are refunds of prior purchases, that match the date of the original purchase but have a negative value of the purchase price. I need to remove both data rows of the purchase and the refund because it does not represent what was actually sold off by the end of the month. Below is an example of what I have and what I want. Note the "seqeunce_num" is just the row of the observation to make it easier to show what is deleted in the Want dataset:
Have:
sequence_num | ID | Date | payment |
1 | 1 | 1/1/2012 | 5 |
2 | 1 | 1/2/2012 | 5 |
3 | 1 | 1/2/2012 | -5 |
4 | 2 | 1/6/2012 | 7 |
5 | 2 | 1/6/2012 | 7 |
6 | 3 | 1/6/2012 | 13 |
7 | 4 | 1/12/2012 | 10 |
8 | 4 | 1/12/2012 | -10 |
Want:
sequence_num | ID | Date | payment |
1 | 1 | 1/1/2012 | 5 |
4 | 2 | 1/6/2012 | 7 |
5 | 2 | 1/6/2012 | 7 |
6 | 3 | 1/6/2012 | 13 |
I'm not sure what to do because using the nodupkey in proc sort only removes one of the claims (such as the refund) but not both (the original purchase and refund).
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input sequence_num ID Date :ddmmyy10. payment;
format Date ddmmyy10.;
datalines;
1 1 1/1/2012 8
2 1 1/2/2012 5
3 1 1/2/2012 -5
4 2 1/6/2012 7
5 2 1/6/2012 7
6 3 1/6/2012 13
7 4 1/12/2012 10
8 4 1/12/2012 -10
;
proc sql;
create table want as
select * from have
where payment not in
(select payment * -1 from have)
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
input sequence_num ID Date :ddmmyy10. payment;
format Date ddmmyy10.;
datalines;
1 1 1/1/2012 8
2 1 1/2/2012 5
3 1 1/2/2012 -5
4 2 1/6/2012 7
5 2 1/6/2012 7
6 3 1/6/2012 13
7 4 1/12/2012 10
8 4 1/12/2012 -10
;
proc sql;
create table want as
select * from have
where payment not in
(select payment * -1 from have)
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Updated: I thought this did work but when testing it on the larger data set, if two patients had matching sales, it would delete all of them.
I need it to look within each patient and only delete the original purchase and refund if the date matches and the payment/refund amount are identical (given that one is positive and one is negative)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You sure?
Something like this seems to break it:
data have;
input sequence_num ID Date :ddmmyy10. payment;
format Date ddmmyy10.;
datalines;
1 1 1/1/2012 8
2 1 1/2/2012 5
3 1 1/2/2012 -5
3 2 1/2/2012 5
4 2 1/6/2012 7
5 2 1/6/2012 7
6 3 1/6/2012 13
7 4 1/12/2012 10
8 4 1/12/2012 -10
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I admit I got hasty in my response seeing it work on the smaller data sample I had pulled and when I tried it out on the larger data set it messed up. I need it to be within customers and be based on matching dates for them to make sure that they are the same sale and it's not just deleting any other sale.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One easy method to achieve that add a variable, 1/-1 for each row. If the row is positive it's 1, if it's negative it's -1. To get the number of transactions sum the variable. To get the amounts, summing the amounts will be fine. Average? Use the variable as weights.
If the transactions are always on the same date this works fine.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content