Hello,
I've a dataset which has customer purchases and returns information. When a customer returns an item the purchased record is duplicated with negative cost and quantitity value (Please see the image below). Even though it's a duplicate it's given an unique sequence number and record number. I want to remove the records which have the purchase and it's returns by looking at the records which have similar entrier across all variables (except for Sequence and Record No).
Dataset (have):
Dataset (want):
Thank you!
You can do this like the following, I have no time to make dataset. data positive negative; set have; if cost <0 then output negative; else output positive; run; data negative; set negative; keep custome_id quality cost; quality=-quality; cost=-cost; run; data want; merge positive(in=ina) negative(in=inb); by customer_id quality cost; if not (ina and inb); run;
I have thought about this, and I don't think there's a unique solution the way you have described it. Perhaps it doesn't matter to you that the solution is not unique, but I don't know if it matters or not.
If there is a second purchase for $200 on 5/4/2012, which of the $200 purchases do you delete? Or doesn't it matter?
There also seems to be no logical way to pair sequence #6 with sequence #11, unless additional assumptions are made. Please clarify.
Hello,
It doesn't matter if $200 purchase is of the same service and all other variables (Category1, Category2, Date and Quantity) are same.
Every purchase and it's returns are in pairs, If I drop "Sequence No" and "Record No" then you can see every variable is similar, only cost and quantity are in compliment values (Purchase +ve and Return -ve).
I'm new to the SAS tool, don't know if it needs an unique anwser or not. It would be great help if you come up with one.
Thanks
Removing Duplicate rows with one based on multiple columns
this post
https://goo.gl/0x7a4v
https://communities.sas.com/t5/General-SAS-Programming/Removing-Duplicate-rows-with-one-based-on-multiple-columns/m-p/347150
Keep the first record where num1-num7 are in a dup group, even though seq and cid are unique
HAVE
====
Up to 40 obs WORK.HAVE total obs=271
Obs SEQ CID NUM1 NUM2 NUM3 NUM4 NUM5 NUM6 NUM7
1 000 0000 578 393 613 704 928 813 125 DUPS
2 X00 X000 578 393 613 704 928 813 125
3 Z00 Z000 578 393 613 704 928 813 125
4 731 5731 514 301 802 34 703 218 338
5 462 1462 839 114 283 549 73 89 748
6 193 7193 457 379 774 663 240 344 785
7 924 2924 343 384 64 491 362 314 361
8 655 8655 362 431 598 698 946 56 362 DUPS
9 X55 X655 362 431 598 698 946 56 362
10 386 4386 321 15 788 932 639 391 194 DUPS
11 X86 X386 321 15 788 932 639 391 194
12 Z86 Z386 321 15 788 932 639 391 194
13 117 0117 772 826 164 560 317 986 897
14 848 5848 856 911 99 742 75 125 528
15 579 1579 29 995 879 992 716 961 108 DUPS
16 X79 X579 29 995 879 992 716 961 108
17 Z79 Z579 29 995 879 992 716 961 108
WANT
===
Obs SEQ CID NUM1 NUM2 NUM3 NUM4 NUM5 NUM6 NUM7
1 000 0000 578 393 613 704 928 813 125
4 731 5731 514 301 802 34 703 218 338
5 462 1462 839 114 283 549 73 89 748
6 193 7193 457 379 774 663 240 344 785
7 924 2924 343 384 64 491 362 314 361
8 655 8655 362 431 598 698 946 56 362
10 386 4386 321 15 788 932 639 391 194
13 117 0117 772 826 164 560 317 986 897
14 848 5848 856 911 99 742 75 125 528
15 579 1579 29 995 879 992 716 961 108
* create tha data;
data have(where=(num7 ne 0));
retain seq cid;
retain num1-num7 0;
array therest num1-num7;
do dig=1000000 to 2000000 by 5731;
seq=substr(put(dig,7.),5);
cid=substr(put(dig,7.),4);
if uniform(5731)<.33 then do;
do over therest;
therest=int(1000*uniform(5764));
end;
output;
substr(cid,1,1)='X';
substr(seq,1,1)='X';
output;
if uniform(6754)<.66 then do;
substr(cid,1,1)='Z';
substr(seq,1,1)='Z';
output;
end;
end;
else do;
do over therest;
therest=int(1000*uniform(5764));
end;
output;
end;
end;
drop dig;
run;quit;
* SOLUTION;
data want;
set have;
by num1--num7 notsorted; * note in your case record_num--cost;
if first.num1;
run;quit;
Hello, Thanks for trying to help me. Your solution doesn't remove the purchase and return pairs (Which I want) but keeps the purchase and remove the returns.
Thanks
My mistake. I think this may be what you want.
data have(where=(num7 ne 0));
retain seq cid;
retain num1-num7 0;
array therest num1-num7;
do dig=1000000 to 2000000 by 5731;
seq=substr(put(dig,7.),5);
cid=substr(put(dig,7.),4);
if uniform(5731)<.33 then do;
do over therest;
therest=int(1000*uniform(5764));
end;
output;
substr(cid,1,1)='X';
substr(seq,1,1)='X';
output;
if uniform(6754)<.66 then do;
substr(cid,1,1)='Z';
substr(seq,1,1)='Z';
output;
end;
end;
else do;
do over therest;
therest=int(1000*uniform(5764));
end;
output;
end;
end;
drop dig;
run;quit;
* SOLUTION;
data want;
set have;
by num1--num7 notsorted; * note in your case record_num--cost;
if first.num7 and last.num7;
run;quit;
Up to 40 obs WORK.WANT total obs=114
Obs SEQ CID NUM1 NUM2 NUM3 NUM4 NUM5 NUM6 NUM7
1 731 5731 514 301 802 34 703 218 338
2 462 1462 839 114 283 549 73 89 748
3 193 7193 457 379 774 663 240 344 785
4 924 2924 343 384 64 491 362 314 361
5 117 0117 772 826 164 560 317 986 897
6 848 5848 856 911 99 742 75 125 528
7 310 7310 599 37 836 77 101 829 245
8 041 3041 395 33 557 885 650 990 343
9 503 4503 744 490 117 202 182 869 767
10 965 5965 132 17 475 692 988 401 334
Hello Roger, I ran the code, there is no change in the dataset.
You can do this like the following, I have no time to make dataset. data positive negative; set have; if cost <0 then output negative; else output positive; run; data negative; set negative; keep custome_id quality cost; quality=-quality; cost=-cost; run; data want; merge positive(in=ina) negative(in=inb); by customer_id quality cost; if not (ina and inb); run;
Hello Ksharp,
It worked, didn't imagine it can be as simple as this. I can see why you're a grand advisor.
Thanks a ton!
--Sujith
If you have multiple duplicated obs like
custome_id quality cost; 1 1 200
1 1 200
1 -1 -200
My code couldn't work.and you could try:
data want;
ina=0;inb=0; merge positive(in=ina) negative(in=inb); by customer_id quality cost; if not (ina and inb); run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.