Update: selection solution actually doesn't work well in full data. See thread. Sorry for being hasty to accept a response!
I am working with some payment data that has several variables in it (ID is the customer ID, date is the date of service, and payment is how much the customer paid). However, some of the payment data are refunds of prior purchases, that match the date of the original purchase but have a negative value of the purchase price. I need to remove both data rows of the purchase and the refund because it does not represent what was actually sold off by the end of the month. Below is an example of what I have and what I want. Note the "seqeunce_num" is just the row of the observation to make it easier to show what is deleted in the Want dataset:
Have:
sequence_num | ID | Date | payment |
1 | 1 | 1/1/2012 | 5 |
2 | 1 | 1/2/2012 | 5 |
3 | 1 | 1/2/2012 | -5 |
4 | 2 | 1/6/2012 | 7 |
5 | 2 | 1/6/2012 | 7 |
6 | 3 | 1/6/2012 | 13 |
7 | 4 | 1/12/2012 | 10 |
8 | 4 | 1/12/2012 | -10 |
Want:
sequence_num | ID | Date | payment |
1 | 1 | 1/1/2012 | 5 |
4 | 2 | 1/6/2012 | 7 |
5 | 2 | 1/6/2012 | 7 |
6 | 3 | 1/6/2012 | 13 |
I'm not sure what to do because using the nodupkey in proc sort only removes one of the claims (such as the refund) but not both (the original purchase and refund).
data have;
input sequence_num ID Date :ddmmyy10. payment;
format Date ddmmyy10.;
datalines;
1 1 1/1/2012 8
2 1 1/2/2012 5
3 1 1/2/2012 -5
4 2 1/6/2012 7
5 2 1/6/2012 7
6 3 1/6/2012 13
7 4 1/12/2012 10
8 4 1/12/2012 -10
;
proc sql;
create table want as
select * from have
where payment not in
(select payment * -1 from have)
;
quit;
data have;
input sequence_num ID Date :ddmmyy10. payment;
format Date ddmmyy10.;
datalines;
1 1 1/1/2012 8
2 1 1/2/2012 5
3 1 1/2/2012 -5
4 2 1/6/2012 7
5 2 1/6/2012 7
6 3 1/6/2012 13
7 4 1/12/2012 10
8 4 1/12/2012 -10
;
proc sql;
create table want as
select * from have
where payment not in
(select payment * -1 from have)
;
quit;
Updated: I thought this did work but when testing it on the larger data set, if two patients had matching sales, it would delete all of them.
I need it to look within each patient and only delete the original purchase and refund if the date matches and the payment/refund amount are identical (given that one is positive and one is negative)
You sure?
Something like this seems to break it:
data have;
input sequence_num ID Date :ddmmyy10. payment;
format Date ddmmyy10.;
datalines;
1 1 1/1/2012 8
2 1 1/2/2012 5
3 1 1/2/2012 -5
3 2 1/2/2012 5
4 2 1/6/2012 7
5 2 1/6/2012 7
6 3 1/6/2012 13
7 4 1/12/2012 10
8 4 1/12/2012 -10
;
I admit I got hasty in my response seeing it work on the smaller data sample I had pulled and when I tried it out on the larger data set it messed up. I need it to be within customers and be based on matching dates for them to make sure that they are the same sale and it's not just deleting any other sale.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.