BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sujithpeta
Quartz | Level 8

Hello,

 

I've a dataset which has customer purchases and returns information. When a customer returns an item the purchased record is duplicated with negative cost and quantitity value (Please see the image below). Even though it's a duplicate it's given an unique sequence number and record number. I want to remove the records which have the purchase and it's returns by looking at the records which have similar entrier across all variables (except for Sequence and Record No).

 

Dataset (have):

Untitled.png

 

Dataset (want):

Untitled.png

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
You can do this like the following, 
I have no time to make dataset.

data positive negative;
 set have;
 if cost <0 then output negative;
 else output positive;
run;

data negative;
 set negative;
keep custome_id quality cost;
quality=-quality;
cost=-cost;
run;

data want;
 merge positive(in=ina) negative(in=inb);
 by customer_id quality cost;
 if not (ina and inb);
run;


View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

I have thought about this, and I don't think there's a unique solution the way you have described it. Perhaps it doesn't matter to you that the solution is not unique, but I don't know if it matters or not.

 

If there is a second purchase for $200 on 5/4/2012, which of the $200 purchases do you delete? Or doesn't it matter?

 

There also seems to be no logical way to pair sequence #6 with sequence #11, unless additional assumptions are made. Please clarify.

 

 

--
Paige Miller
Sujithpeta
Quartz | Level 8

Hello,

 

It doesn't matter if $200 purchase is of the same service and all other variables (Category1, Category2, Date and Quantity) are same.

Every purchase and it's returns are in pairs, If I drop "Sequence No" and "Record No" then you can see every variable is similar, only cost and quantity are in compliment values (Purchase +ve and Return -ve).

I'm new to the SAS tool, don't know if it needs an unique anwser or not. It would be great help if you come up with one.

 

Thanks

rogerjdeangelis
Barite | Level 11
Removing Duplicate rows with one based on multiple columns

this post
https://goo.gl/0x7a4v
https://communities.sas.com/t5/General-SAS-Programming/Removing-Duplicate-rows-with-one-based-on-multiple-columns/m-p/347150

Keep the first record where num1-num7 are in a dup group, even though seq and cid are unique

HAVE
====

Up to 40 obs WORK.HAVE total obs=271

Obs    SEQ    CID     NUM1    NUM2    NUM3    NUM4    NUM5    NUM6    NUM7

  1    000    0000     578     393     613     704     928     813     125 DUPS
  2    X00    X000     578     393     613     704     928     813     125
  3    Z00    Z000     578     393     613     704     928     813     125

  4    731    5731     514     301     802      34     703     218     338

  5    462    1462     839     114     283     549      73      89     748

  6    193    7193     457     379     774     663     240     344     785

  7    924    2924     343     384      64     491     362     314     361

  8    655    8655     362     431     598     698     946      56     362 DUPS
  9    X55    X655     362     431     598     698     946      56     362

 10    386    4386     321      15     788     932     639     391     194 DUPS
 11    X86    X386     321      15     788     932     639     391     194
 12    Z86    Z386     321      15     788     932     639     391     194

 13    117    0117     772     826     164     560     317     986     897
 14    848    5848     856     911      99     742      75     125     528

 15    579    1579      29     995     879     992     716     961     108 DUPS
 16    X79    X579      29     995     879     992     716     961     108
 17    Z79    Z579      29     995     879     992     716     961     108

WANT
===

Obs    SEQ    CID     NUM1    NUM2    NUM3    NUM4    NUM5    NUM6    NUM7

  1    000    0000     578     393     613     704     928     813     125

  4    731    5731     514     301     802      34     703     218     338

  5    462    1462     839     114     283     549      73      89     748

  6    193    7193     457     379     774     663     240     344     785

  7    924    2924     343     384      64     491     362     314     361

  8    655    8655     362     431     598     698     946      56     362

 10    386    4386     321      15     788     932     639     391     194

 13    117    0117     772     826     164     560     317     986     897

 14    848    5848     856     911      99     742      75     125     528

 15    579    1579      29     995     879     992     716     961     108

* create tha data;
data have(where=(num7 ne 0));
  retain seq cid;
  retain num1-num7 0;
  array therest num1-num7;
  do dig=1000000 to 2000000 by 5731;
    seq=substr(put(dig,7.),5);
    cid=substr(put(dig,7.),4);
    if uniform(5731)<.33 then do;
       do over therest;
         therest=int(1000*uniform(5764));
       end;
       output;
       substr(cid,1,1)='X';
       substr(seq,1,1)='X';
       output;
       if uniform(6754)<.66 then do;
          substr(cid,1,1)='Z';
          substr(seq,1,1)='Z';
          output;
       end;
    end;
    else do;
       do over therest;
         therest=int(1000*uniform(5764));
       end;
       output;
    end;
  end;
  drop dig;
run;quit;

* SOLUTION;

data want;
  set have;
  by num1--num7 notsorted; * note in your case record_num--cost;
  if first.num1;
run;quit;


Sujithpeta
Quartz | Level 8

Hello, Thanks for trying to help me. Your solution doesn't remove the purchase and return pairs (Which I want) but keeps the purchase and remove the returns.

Thanks

 

 

rogerjdeangelis
Barite | Level 11
My mistake. I think this may be what you want.

data have(where=(num7 ne 0));
  retain seq cid;
  retain num1-num7 0;
  array therest num1-num7;
  do dig=1000000 to 2000000 by 5731;
    seq=substr(put(dig,7.),5);
    cid=substr(put(dig,7.),4);
    if uniform(5731)<.33 then do;
       do over therest;
         therest=int(1000*uniform(5764));
       end;
       output;
       substr(cid,1,1)='X';
       substr(seq,1,1)='X';
       output;
       if uniform(6754)<.66 then do;
          substr(cid,1,1)='Z';
          substr(seq,1,1)='Z';
          output;
       end;
    end;
    else do;
       do over therest;
         therest=int(1000*uniform(5764));
       end;
       output;
    end;
  end;
  drop dig;
run;quit;

* SOLUTION;

data want;
  set have;
  by num1--num7 notsorted; * note in your case record_num--cost;
  if first.num7 and last.num7;
run;quit;


Up to 40 obs WORK.WANT total obs=114

Obs    SEQ    CID     NUM1    NUM2    NUM3    NUM4    NUM5    NUM6    NUM7

  1    731    5731     514     301     802      34     703     218     338
  2    462    1462     839     114     283     549      73      89     748
  3    193    7193     457     379     774     663     240     344     785
  4    924    2924     343     384      64     491     362     314     361
  5    117    0117     772     826     164     560     317     986     897
  6    848    5848     856     911      99     742      75     125     528
  7    310    7310     599      37     836      77     101     829     245
  8    041    3041     395      33     557     885     650     990     343
  9    503    4503     744     490     117     202     182     869     767
 10    965    5965     132      17     475     692     988     401     334


Sujithpeta
Quartz | Level 8

Hello Roger, I ran the code, there is no change in the dataset.Untitled.png

 

Ksharp
Super User
You can do this like the following, 
I have no time to make dataset.

data positive negative;
 set have;
 if cost <0 then output negative;
 else output positive;
run;

data negative;
 set negative;
keep custome_id quality cost;
quality=-quality;
cost=-cost;
run;

data want;
 merge positive(in=ina) negative(in=inb);
 by customer_id quality cost;
 if not (ina and inb);
run;


Sujithpeta
Quartz | Level 8

Hello Ksharp,

 

It worked, didn't imagine it can be as simple as this. I can see why you're a grand advisor.

 

Thanks a ton!

--Sujith

Ksharp
Super User

If you have multiple duplicated obs like

 

custome_id quality cost;
1 1 200
1 1 200
1 -1 -200

My code couldn't work.and you could try:
data want;
ina=0;inb=0; merge positive(in=ina) negative(in=inb); by customer_id quality cost; if not (ina and inb); run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1941 views
  • 0 likes
  • 4 in conversation