BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
FriedEgg
SAS Employee

data have;

  input slug :$10. (gar h3432 fing up99 thisaw) (:12.) _dllr :comma9.2 dte :yymmdd10. note &:$15. ;

  _dllr=abs(_dllr);

  cards;

ABCD  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid

ABCD  1234  1  1202  671  020 -1,234.56   2011-01-31  Fees Paid

ABCD  2345  1  1203  672  030  1,234.56   2011-01-31  Fees Paid

ABCD  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid

WXYZ  2323  1  3212  672  020  6,543.89   2011-02-28  Allowance

;

run;

proc sort data=have; by _all_; run;

data _null_;

set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE')) end=done;

if done then call symputx('nvar',name);

run;

data want;

set have;

by _all_;

if first.&nvar and last.&nvar then output;

run;

FriedEgg
SAS Employee

Okay, I think I understand your requirements now.

data have;

input slug :$10. (gar h3432 fing up99 thisaw) (:12.) _dllr :comma9.2 dte :yymmdd10. note &:$15. ;

cards;

ABCD  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid

ABCD  1234  1  1202  671  020 -1,234.56   2011-01-31  Fees Paid

ABCD  2345  1  1203  672  030  1,234.56   2011-01-31  Fees Paid

ABCD  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid

WXYZ  2323  1  3212  672  020  6,543.89   2011-02-28  Allowance

ERDS  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid

ERDS  1234  1  1202  671  020 -1,234.56   2011-01-31  Fees Paid

QRDS  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid

QRDS  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid

;

run;

proc sort data=have; by _all_; run;

data check;

if 0 then __dllr=0;

declare hash ha(dataset:'have(rename=(_dllr=__dllr) where=(__dllr>0))');

  ha.definekey('slug','gar','h3432','fing','up99','thisaw');

  ha.definedata(all:'y');

  ha.definedone();

do until(done);

  set have end=done;

  by slug gar h3432 fing up99 thisaw _dllr dte note;

   if first.thisaw and ha.find()=0 and sum(of _dllr __dllr)>0 then output;

   if last.thisaw and last.note and not(first.note) then output;

end;

stop;

drop __:;

run;

proc print data=check noobs; run;

                                                             INPUT DATA                                                            1

                         slug     gar    h3432    fing    up99    thisaw      _dllr      dte       note

                         ABCD    1234      1      1202     671      20      -1234.56    18658    Fees Paid

                         ABCD    1234      1      1202     671      20       1234.56    18658    Fees Paid

                         ABCD    1234      1      1202     671      20       1234.56    18658    Fees Paid

                         ABCD    2345      1      1203     672      30       1234.56    18658    Fees Paid

                         ERDS    1234      1      1202     671      20      -1234.56    18658    Fees Paid

                         ERDS    1234      1      1202     671      20       1234.56    18658    Fees Paid

                         QRDS    1234      1      1202     671      20       1234.56    18658    Fees Paid

                         QRDS    1234      1      1202     671      20       1234.56    18658    Fees Paid

                         WXYZ    2323      1      3212     672      20       6543.89    18686    Allowance

                                                            OUTPUT DATA                                                            2

                          slug     gar    h3432    fing    up99    thisaw     _dllr      dte       note

                          ABCD    1234      1      1202     671      20      1234.56    18658    Fees Paid

                          ABCD    2345      1      1203     672      30      1234.56    18658    Fees Paid

                          QRDS    1234      1      1202     671      20      1234.56    18658    Fees Paid

                          QRDS    1234      1      1202     671      20      1234.56    18658    Fees Paid

                          WXYZ    2323      1      3212     672      20      6543.89    18686    Allowance

I want to remove:

Obs1 and 2 because they cancel eachother out (buy and return) but not 3 (repurchase)

Obs4 is a standalone item purchase, it stays

Obs5 and 6 are removed (buy and return)

Obs6 and 7 stay (two identical purchases, not purchase and return)

Obs8 stays, standalone purchase.

akberali67
Calcite | Level 5

Ok, that seems a lil complicated.

I edited the program to fit the data but I am getting a lot of errors. I made a new field called JE_ID to fix problems with another fields which is additional here

Program:

PROC SORT DATA = SASDATA.Q1Q2_SALES;

BY _ALL_;

RUN;

DATA SASDATA.Z_Q1Q2_SALES;

IF 0 THEN MONETARY_AMOUNT=0;

DECLARE HASH HA(DATASET:'SASDATA.Q1Q2_SALES WHERE=(MONETARY_AMOUNT>0))');

HA.DEFINEKEY('OPERATOR','JOURNAL','LINE__','UNIT','ACCT','DEPARTMENT','DATE','JOURNAL_LINE_DESCRIPTION','JE_ID');

HA.DEFINEDATA(ALL:'Y');

HA.DEFINEDONE();

DO UNTIL(DONE);

SET SASDATA.Q1Q2_SALES END=DONE;

BY OPERATOR JOURNAL LINE__ UNIT ACCT DEPARTMENT MONETARY_AMOUNT DATE JOURNAL_LINE_DESCRIPTION JE_ID;

IF FIRST.DEPARTMENT AND HA.FIND()=0 AND SUM(OF MONETARY_AMOUNT)>0 THEN OUTPUT;

IF LAST.DEPARTMENT AND LAST.JOURNAL_LINE_DESCRIPTION AND NOT(FIRST.JOURNAL_LINE_DESCRIPTION) THEN OUTPUT;

END;

STOP;

RUN;

Log:

304

305  PROC SORT DATA = SASDATA.Q1Q2_SALES;

306  BY _ALL_;

307  RUN;

NOTE: There were 206 observations read from the data set SASDATA.Q1Q2_SALES.

NOTE: The data set SASDATA.Q1Q2_SALES has 206 observations and 11 variables.

NOTE: PROCEDURE SORT used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

308

309  DATA SASDATA.Z_Q1Q2_SALES;

310  IF 0 THEN MONETARY_AMOUNT=0;

311  DECLARE HASH HA(DATASET:'SASDATA.Q1Q2_SALES WHERE=(MONETARY_AMOUNT>0))');

312  HA.DEFINEKEY('OPERATOR','JOURNAL','LINE__','UNIT','ACCT','DEPARTMENT','DATE','JOURNAL_LINE_DESCRI

312! PTION','JE_ID');

313  HA.DEFINEDATA(ALL:'Y');

314  HA.DEFINEDONE();

315

316

317  DO UNTIL(DONE);

318  SET SASDATA.Q1Q2_SALES END=DONE;

319  BY OPERATOR JOURNAL LINE__ UNIT ACCT DEPARTMENT MONETARY_AMOUNT DATE JOURNAL_LINE_DESCRIPTION

319! JE_ID;

320  IF FIRST.DEPARTMENT AND HA.FIND()=0 AND SUM(OF MONETARY_AMOUNT)>0 THEN OUTPUT;

321  IF LAST.DEPARTMENT AND LAST.JOURNAL_LINE_DESCRIPTION AND NOT(FIRST.JOURNAL_LINE_DESCRIPTION) THEN

321!  OUTPUT;

322  END;

323  STOP;

324  RUN;

ERROR: Invalid data set name at line 313 column 1.

ERROR: DATA STEP Component Object failure.  Aborted during the EXECUTION phase.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 1 observations read from the data set SASDATA.Q1Q2_SALES.

WARNING: The data set SASDATA.Z_Q1Q2_SALES may be incomplete.  When this step was stopped there were

         0 observations and 11 variables.

WARNING: Data set SASDATA.Z_Q1Q2_SALES was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

Cant figure whats going wrong, please help

art297
Opal | Level 21

Akber,  I think that you've confused matters by changing the variable names and now having one more variable than you showed in your example data.

Regardless, I still think that the methodology I proposed the other day will still be the easiest for you to manage (although it might not be the most efficient code).  I may have some of the variables or ordering wrong, but doesn't the following do what you want?:

PROC SORT DATA = Q1Q2_SALES;

  BY OPERATOR JOURNAL LINE__ UNIT ACCT DEPARTMENT

     DATE MONETARY_AMOUNT

     JOURNAL_LINE_DESCRIPTION;

RUN;

data Z_Q1Q2_SALES;

  retain delete_me;

  if _n_ eq 1 then delete_me=0;

  set Q1Q2_SALES;

  by OPERATOR JOURNAL LINE__ UNIT ACCT DEPARTMENT

     DATE JOURNAL_LINE_DESCRIPTION;

  set Q1Q2_SALES ( firstobs = 2 keep = MONETARY_AMOUNT

       rename = (MONETARY_AMOUNT = Next_MONETARY_AMOUNT) )

      Q1Q2_SALES (      obs = 1 drop = _all_ );

  if delete_me then do;

    output;

    delete_me=0;

  end;

  else if not last.journal_line_description then do;

    if sum(monetary_amount,next_monetary_amount) eq 0 then do;

      delete_me=1;

      output;

    end;

    else output;

  end;

  else output;

run;

data Z_Q1Q2_SALES;

  set Z_Q1Q2_SALES;

  if delete_me eq 0;

run;

FriedEgg
SAS Employee

It looks like you left out a number of details when transcribing my code.

data have;

input operator :$10. (journal line__ unit acct department) (:12.) monetary_amount :comma9.2 date :yymmdd10. journal_line_description &:$15. je_id;

cards;

ABCD  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid    1

ABCD  1234  1  1202  671  020 -1,234.56   2011-01-31  Fees Paid    1

ABCD  2345  1  1203  672  030  1,234.56   2011-01-31  Fees Paid    1

ABCD  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid    1

WXYZ  2323  1  3212  672  020  6,543.89   2011-02-28  Allowance    1

ERDS  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid    1

ERDS  1234  1  1202  671  020 -1,234.56   2011-01-31  Fees Paid    1

QRDS  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid    1

QRDS  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid    1

;

run;

proc sort data=have; by operator journal line__ unit acct department monetary_amount date journal_line_description; run;

data check;

if 0 then __dllr=0;

declare hash ha(dataset:'have(rename=(monetary_amount=__dllr) where=(__dllr>0))');

  ha.definekey('journal','line__','unit','acct','department');

  ha.definedata('__dllr');

  ha.definedone();

do until(done);

  set have end=done;

  by operator journal line__ unit acct department monetary_amount date journal_line_description;

   if first.department and ha.find()=0 and sum(of monetary_amount __dllr)>0 then output;

   if last.department and last.journal_line_description and not(first.journal_line_description) then output;

end;

stop;

drop __:;

run;

options nodate;

proc print data=have noobs; title 'INPUT DATA'; run;

proc print data=check noobs; title 'OUTPUT DATA'; run;

                                                                           monetary_             journal_line_

            operator    journal    line__    unit    acct    department      amount      date     description     je_id

              ABCD        1234        1      1202     671        20         -1234.56    18658      Fees Paid        1 

              ABCD        1234        1      1202     671        20          1234.56    18658      Fees Paid        1 

              ABCD        1234        1      1202     671        20          1234.56    18658      Fees Paid        1 

              ABCD        2345        1      1203     672        30          1234.56    18658      Fees Paid        1 

              ERDS        1234        1      1202     671        20         -1234.56    18658      Fees Paid        1 

              ERDS        1234        1      1202     671        20          1234.56    18658      Fees Paid        1 

              QRDS        1234        1      1202     671        20          1234.56    18658      Fees Paid        1 

              QRDS        1234        1      1202     671        20          1234.56    18658      Fees Paid        1 

              WXYZ        2323        1      3212     672        20          6543.89    18686      Allowance        1 

                                                                           monetary_             journal_line_

            operator    journal    line__    unit    acct    department      amount      date     description     je_id

              ABCD        1234        1      1202     671        20         1234.56     18658      Fees Paid        1 

              ABCD        2345        1      1203     672        30         1234.56     18658      Fees Paid        1 

              QRDS        1234        1      1202     671        20         1234.56     18658      Fees Paid        1 

              QRDS        1234        1      1202     671        20         1234.56     18658      Fees Paid        1 

              WXYZ        2323        1      3212     672        20         6543.89     18686      Allowance        1 

Ksharp
Super User

How about:

And Thanks Tom, using something learned from you!

data have;
  input slug :$10. (gar h3432 fing up99 thisaw) (:12.) _dllr :comma9.2 dte :yymmdd10. note &:$15. ;
  format dte yymmdd10.;
  cards;
ABCD  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid
ABCD  1234  1  1202  671  020 -1,234.56   2011-01-31  Fees Paid
ABCD  2345  1  1203  672  030  1,234.56   2011-01-31  Fees Paid
ABCD  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid
WXYZ  2323  1  3212  672  020  6,543.89   2011-02-28  Allowance
;
run;
data positive negative;
 set have;
 if _dllr lt 0 then do;_dllr=abs(_dllr);output negative;end;
  else output positive;
run;
proc sort data=positive;by _dllr;run;
proc sort data=negative;by _dllr;run;
data want;
 ina=0;inb=0;
 merge positive(in=ina) negative(in=inb);
 by _dllr;
 if not ina or not inb then do; 
                              if not ina then _dllr=_dllr*-1;
                              output;
                            end;
run;

Ksharp

MikeZdeb
Rhodochrosite | Level 12

hi ... if the goal is " ... simply remove TWO observations with equal Amount with a different sign, based just on amount and no other field ..." , I think this also works (using Fried Egg's data and idea similar to Ksharp's matching positive and negative values of _dllr) ... ...

data have;

input slug :$10. (gar h3432 fing up99 thisaw) (:12.) _dllr :comma9.2 dte :yymmdd10. note &:$15. ;

cards;

ABCD  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid

ABCD  1234  1  1202  671  020 -1,234.56   2011-01-31  Fees Paid

ABCD  2345  1  1203  672  030  1,234.56   2011-01-31  Fees Paid

ABCD  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid

WXYZ  2323  1  3212  672  020  6,543.89   2011-02-28  Allowance

ERDS  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid

ERDS  1234  1  1202  671  020 -1,234.56   2011-01-31  Fees Paid

QRDS  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid

QRDS  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid

;

run;

proc sort data=have;

by _dllr;

run;

* negative _dllr;

data neg (index=(_dllr));

set have (keep=_dllr where=(_dllr lt 0));

_dllr = abs(_dllr);

run;

* positive _dllr ... check for matches among negative _dllr, if no match or out of matches ... output;

data want;

set have (where=(_dllr gt 0));

set neg key=_dllr;

if _error_;

_error_ = 0;

run;

ps  given all the various responses, I'm not sure if this is still the problem ...  if it involves looking for negative values plus matching on other variables, just change the index

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 21 replies
  • 1837 views
  • 5 likes
  • 6 in conversation