data have;
input slug :$10. (gar h3432 fing up99 thisaw) (:12.) _dllr :comma9.2 dte :yymmdd10. note &:$15. ;
_dllr=abs(_dllr);
cards;
ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid
ABCD 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid
ABCD 2345 1 1203 672 030 1,234.56 2011-01-31 Fees Paid
ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid
WXYZ 2323 1 3212 672 020 6,543.89 2011-02-28 Allowance
;
run;
proc sort data=have; by _all_; run;
data _null_;
set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE')) end=done;
if done then call symputx('nvar',name);
run;
data want;
set have;
by _all_;
if first.&nvar and last.&nvar then output;
run;
Okay, I think I understand your requirements now.
data have;
input slug :$10. (gar h3432 fing up99 thisaw) (:12.) _dllr :comma9.2 dte :yymmdd10. note &:$15. ;
cards;
ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid
ABCD 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid
ABCD 2345 1 1203 672 030 1,234.56 2011-01-31 Fees Paid
ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid
WXYZ 2323 1 3212 672 020 6,543.89 2011-02-28 Allowance
ERDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid
ERDS 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid
QRDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid
QRDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid
;
run;
proc sort data=have; by _all_; run;
data check;
if 0 then __dllr=0;
declare hash ha(dataset:'have(rename=(_dllr=__dllr) where=(__dllr>0))');
ha.definekey('slug','gar','h3432','fing','up99','thisaw');
ha.definedata(all:'y');
ha.definedone();
do until(done);
set have end=done;
by slug gar h3432 fing up99 thisaw _dllr dte note;
if first.thisaw and ha.find()=0 and sum(of _dllr __dllr)>0 then output;
if last.thisaw and last.note and not(first.note) then output;
end;
stop;
drop __:;
run;
proc print data=check noobs; run;
INPUT DATA 1
slug gar h3432 fing up99 thisaw _dllr dte note
ABCD 1234 1 1202 671 20 -1234.56 18658 Fees Paid
ABCD 1234 1 1202 671 20 1234.56 18658 Fees Paid
ABCD 1234 1 1202 671 20 1234.56 18658 Fees Paid
ABCD 2345 1 1203 672 30 1234.56 18658 Fees Paid
ERDS 1234 1 1202 671 20 -1234.56 18658 Fees Paid
ERDS 1234 1 1202 671 20 1234.56 18658 Fees Paid
QRDS 1234 1 1202 671 20 1234.56 18658 Fees Paid
QRDS 1234 1 1202 671 20 1234.56 18658 Fees Paid
WXYZ 2323 1 3212 672 20 6543.89 18686 Allowance
OUTPUT DATA 2
slug gar h3432 fing up99 thisaw _dllr dte note
ABCD 1234 1 1202 671 20 1234.56 18658 Fees Paid
ABCD 2345 1 1203 672 30 1234.56 18658 Fees Paid
QRDS 1234 1 1202 671 20 1234.56 18658 Fees Paid
QRDS 1234 1 1202 671 20 1234.56 18658 Fees Paid
WXYZ 2323 1 3212 672 20 6543.89 18686 Allowance
I want to remove:
Obs1 and 2 because they cancel eachother out (buy and return) but not 3 (repurchase)
Obs4 is a standalone item purchase, it stays
Obs5 and 6 are removed (buy and return)
Obs6 and 7 stay (two identical purchases, not purchase and return)
Obs8 stays, standalone purchase.
Ok, that seems a lil complicated.
I edited the program to fit the data but I am getting a lot of errors. I made a new field called JE_ID to fix problems with another fields which is additional here
Program:
PROC SORT DATA = SASDATA.Q1Q2_SALES;
BY _ALL_;
RUN;
DATA SASDATA.Z_Q1Q2_SALES;
IF 0 THEN MONETARY_AMOUNT=0;
DECLARE HASH HA(DATASET:'SASDATA.Q1Q2_SALES WHERE=(MONETARY_AMOUNT>0))');
HA.DEFINEKEY('OPERATOR','JOURNAL','LINE__','UNIT','ACCT','DEPARTMENT','DATE','JOURNAL_LINE_DESCRIPTION','JE_ID');
HA.DEFINEDATA(ALL:'Y');
HA.DEFINEDONE();
DO UNTIL(DONE);
SET SASDATA.Q1Q2_SALES END=DONE;
BY OPERATOR JOURNAL LINE__ UNIT ACCT DEPARTMENT MONETARY_AMOUNT DATE JOURNAL_LINE_DESCRIPTION JE_ID;
IF FIRST.DEPARTMENT AND HA.FIND()=0 AND SUM(OF MONETARY_AMOUNT)>0 THEN OUTPUT;
IF LAST.DEPARTMENT AND LAST.JOURNAL_LINE_DESCRIPTION AND NOT(FIRST.JOURNAL_LINE_DESCRIPTION) THEN OUTPUT;
END;
STOP;
RUN;
Log:
304
305 PROC SORT DATA = SASDATA.Q1Q2_SALES;
306 BY _ALL_;
307 RUN;
NOTE: There were 206 observations read from the data set SASDATA.Q1Q2_SALES.
NOTE: The data set SASDATA.Q1Q2_SALES has 206 observations and 11 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
308
309 DATA SASDATA.Z_Q1Q2_SALES;
310 IF 0 THEN MONETARY_AMOUNT=0;
311 DECLARE HASH HA(DATASET:'SASDATA.Q1Q2_SALES WHERE=(MONETARY_AMOUNT>0))');
312 HA.DEFINEKEY('OPERATOR','JOURNAL','LINE__','UNIT','ACCT','DEPARTMENT','DATE','JOURNAL_LINE_DESCRI
312! PTION','JE_ID');
313 HA.DEFINEDATA(ALL:'Y');
314 HA.DEFINEDONE();
315
316
317 DO UNTIL(DONE);
318 SET SASDATA.Q1Q2_SALES END=DONE;
319 BY OPERATOR JOURNAL LINE__ UNIT ACCT DEPARTMENT MONETARY_AMOUNT DATE JOURNAL_LINE_DESCRIPTION
319! JE_ID;
320 IF FIRST.DEPARTMENT AND HA.FIND()=0 AND SUM(OF MONETARY_AMOUNT)>0 THEN OUTPUT;
321 IF LAST.DEPARTMENT AND LAST.JOURNAL_LINE_DESCRIPTION AND NOT(FIRST.JOURNAL_LINE_DESCRIPTION) THEN
321! OUTPUT;
322 END;
323 STOP;
324 RUN;
ERROR: Invalid data set name at line 313 column 1.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set SASDATA.Q1Q2_SALES.
WARNING: The data set SASDATA.Z_Q1Q2_SALES may be incomplete. When this step was stopped there were
0 observations and 11 variables.
WARNING: Data set SASDATA.Z_Q1Q2_SALES was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
Cant figure whats going wrong, please help
Akber, I think that you've confused matters by changing the variable names and now having one more variable than you showed in your example data.
Regardless, I still think that the methodology I proposed the other day will still be the easiest for you to manage (although it might not be the most efficient code). I may have some of the variables or ordering wrong, but doesn't the following do what you want?:
PROC SORT DATA = Q1Q2_SALES;
BY OPERATOR JOURNAL LINE__ UNIT ACCT DEPARTMENT
DATE MONETARY_AMOUNT
JOURNAL_LINE_DESCRIPTION;
RUN;
data Z_Q1Q2_SALES;
retain delete_me;
if _n_ eq 1 then delete_me=0;
set Q1Q2_SALES;
by OPERATOR JOURNAL LINE__ UNIT ACCT DEPARTMENT
DATE JOURNAL_LINE_DESCRIPTION;
set Q1Q2_SALES ( firstobs = 2 keep = MONETARY_AMOUNT
rename = (MONETARY_AMOUNT = Next_MONETARY_AMOUNT) )
Q1Q2_SALES ( obs = 1 drop = _all_ );
if delete_me then do;
output;
delete_me=0;
end;
else if not last.journal_line_description then do;
if sum(monetary_amount,next_monetary_amount) eq 0 then do;
delete_me=1;
output;
end;
else output;
end;
else output;
run;
data Z_Q1Q2_SALES;
set Z_Q1Q2_SALES;
if delete_me eq 0;
run;
It looks like you left out a number of details when transcribing my code.
data have;
input operator :$10. (journal line__ unit acct department) (:12.) monetary_amount :comma9.2 date :yymmdd10. journal_line_description &:$15. je_id;
cards;
ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid 1
ABCD 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid 1
ABCD 2345 1 1203 672 030 1,234.56 2011-01-31 Fees Paid 1
ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid 1
WXYZ 2323 1 3212 672 020 6,543.89 2011-02-28 Allowance 1
ERDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid 1
ERDS 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid 1
QRDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid 1
QRDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid 1
;
run;
proc sort data=have; by operator journal line__ unit acct department monetary_amount date journal_line_description; run;
data check;
if 0 then __dllr=0;
declare hash ha(dataset:'have(rename=(monetary_amount=__dllr) where=(__dllr>0))');
ha.definekey('journal','line__','unit','acct','department');
ha.definedata('__dllr');
ha.definedone();
do until(done);
set have end=done;
by operator journal line__ unit acct department monetary_amount date journal_line_description;
if first.department and ha.find()=0 and sum(of monetary_amount __dllr)>0 then output;
if last.department and last.journal_line_description and not(first.journal_line_description) then output;
end;
stop;
drop __:;
run;
options nodate;
proc print data=have noobs; title 'INPUT DATA'; run;
proc print data=check noobs; title 'OUTPUT DATA'; run;
monetary_ journal_line_
operator journal line__ unit acct department amount date description je_id
ABCD 1234 1 1202 671 20 -1234.56 18658 Fees Paid 1
ABCD 1234 1 1202 671 20 1234.56 18658 Fees Paid 1
ABCD 1234 1 1202 671 20 1234.56 18658 Fees Paid 1
ABCD 2345 1 1203 672 30 1234.56 18658 Fees Paid 1
ERDS 1234 1 1202 671 20 -1234.56 18658 Fees Paid 1
ERDS 1234 1 1202 671 20 1234.56 18658 Fees Paid 1
QRDS 1234 1 1202 671 20 1234.56 18658 Fees Paid 1
QRDS 1234 1 1202 671 20 1234.56 18658 Fees Paid 1
WXYZ 2323 1 3212 672 20 6543.89 18686 Allowance 1
monetary_ journal_line_
operator journal line__ unit acct department amount date description je_id
ABCD 1234 1 1202 671 20 1234.56 18658 Fees Paid 1
ABCD 2345 1 1203 672 30 1234.56 18658 Fees Paid 1
QRDS 1234 1 1202 671 20 1234.56 18658 Fees Paid 1
QRDS 1234 1 1202 671 20 1234.56 18658 Fees Paid 1
WXYZ 2323 1 3212 672 20 6543.89 18686 Allowance 1
How about:
And Thanks Tom, using something learned from you!
data have; input slug :$10. (gar h3432 fing up99 thisaw) (:12.) _dllr :comma9.2 dte :yymmdd10. note &:$15. ; format dte yymmdd10.; cards; ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid ABCD 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid ABCD 2345 1 1203 672 030 1,234.56 2011-01-31 Fees Paid ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid WXYZ 2323 1 3212 672 020 6,543.89 2011-02-28 Allowance ; run; data positive negative; set have; if _dllr lt 0 then do;_dllr=abs(_dllr);output negative;end; else output positive; run; proc sort data=positive;by _dllr;run; proc sort data=negative;by _dllr;run; data want; ina=0;inb=0; merge positive(in=ina) negative(in=inb); by _dllr; if not ina or not inb then do; if not ina then _dllr=_dllr*-1; output; end; run;
Ksharp
hi ... if the goal is " ... simply remove TWO observations with equal Amount with a different sign, based just on amount and no other field ..." , I think this also works (using Fried Egg's data and idea similar to Ksharp's matching positive and negative values of _dllr) ... ...
data have;
input slug :$10. (gar h3432 fing up99 thisaw) (:12.) _dllr :comma9.2 dte :yymmdd10. note &:$15. ;
cards;
ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid
ABCD 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid
ABCD 2345 1 1203 672 030 1,234.56 2011-01-31 Fees Paid
ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid
WXYZ 2323 1 3212 672 020 6,543.89 2011-02-28 Allowance
ERDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid
ERDS 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid
QRDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid
QRDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid
;
run;
proc sort data=have;
by _dllr;
run;
* negative _dllr;
data neg (index=(_dllr));
set have (keep=_dllr where=(_dllr lt 0));
_dllr = abs(_dllr);
run;
* positive _dllr ... check for matches among negative _dllr, if no match or out of matches ... output;
data want;
set have (where=(_dllr gt 0));
set neg key=_dllr;
if _error_;
_error_ = 0;
run;
ps given all the various responses, I'm not sure if this is still the problem ... if it involves looking for negative values plus matching on other variables, just change the index
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.