Solved: Removing both observations in a duplicate - Page 2

FriedEgg · Posted 11-08-2011 12:24 PM

data have;

input slug :$10. (gar h3432 fing up99 thisaw) (:12.) _dllr :comma9.2 dte :yymmdd10. note &:$15. ;

_dllr=abs(_dllr);

cards;

ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid

ABCD 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid

ABCD 2345 1 1203 672 030 1,234.56 2011-01-31 Fees Paid

ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid

WXYZ 2323 1 3212 672 020 6,543.89 2011-02-28 Allowance

;

run;

proc sort data=have; by _all_; run;

data _null_;

set sashelp.vcolumn(where=(libname='WORK' and memname='HAVE')) end=done;

if done then call symputx('nvar',name);

run;

data want;

set have;

by _all_;

if first.&nvar and last.&nvar then output;

run;

FriedEgg · Posted 11-09-2011 02:06 PM

Okay, I think I understand your requirements now.

data have;

input slug :$10. (gar h3432 fing up99 thisaw) (:12.) _dllr :comma9.2 dte :yymmdd10. note &:$15. ;

cards;

ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid

ABCD 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid

ABCD 2345 1 1203 672 030 1,234.56 2011-01-31 Fees Paid

ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid

WXYZ 2323 1 3212 672 020 6,543.89 2011-02-28 Allowance

ERDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid

ERDS 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid

QRDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid

;

run;

proc sort data=have; by _all_; run;

data check;

if 0 then __dllr=0;

declare hash ha(dataset:'have(rename=(_dllr=__dllr) where=(__dllr>0))');

ha.definekey('slug','gar','h3432','fing','up99','thisaw');

ha.definedata(all:'y');

ha.definedone();

do until(done);

set have end=done;

by slug gar h3432 fing up99 thisaw _dllr dte note;

if first.thisaw and ha.find()=0 and sum(of _dllr __dllr)>0 then output;

if last.thisaw and last.note and not(first.note) then output;

end;

stop;

drop __:;

run;

proc print data=check noobs; run;

INPUT DATA 1

slug gar h3432 fing up99 thisaw _dllr dte note

ABCD 1234 1 1202 671 20 -1234.56 18658 Fees Paid

ABCD 1234 1 1202 671 20 1234.56 18658 Fees Paid

ABCD 2345 1 1203 672 30 1234.56 18658 Fees Paid

ERDS 1234 1 1202 671 20 -1234.56 18658 Fees Paid

ERDS 1234 1 1202 671 20 1234.56 18658 Fees Paid

QRDS 1234 1 1202 671 20 1234.56 18658 Fees Paid

WXYZ 2323 1 3212 672 20 6543.89 18686 Allowance

OUTPUT DATA 2

slug gar h3432 fing up99 thisaw _dllr dte note

ABCD 1234 1 1202 671 20 1234.56 18658 Fees Paid

ABCD 2345 1 1203 672 30 1234.56 18658 Fees Paid

QRDS 1234 1 1202 671 20 1234.56 18658 Fees Paid

WXYZ 2323 1 3212 672 20 6543.89 18686 Allowance

I want to remove:

Obs1 and 2 because they cancel eachother out (buy and return) but not 3 (repurchase)

Obs4 is a standalone item purchase, it stays

Obs5 and 6 are removed (buy and return)

Obs6 and 7 stay (two identical purchases, not purchase and return)

Obs8 stays, standalone purchase.

akberali67 · Posted 11-09-2011 03:15 PM

Ok, that seems a lil complicated.

I edited the program to fit the data but I am getting a lot of errors. I made a new field called JE_ID to fix problems with another fields which is additional here

Program:

PROC SORT DATA = SASDATA.Q1Q2_SALES;

BY _ALL_;

RUN;

DATA SASDATA.Z_Q1Q2_SALES;

IF 0 THEN MONETARY_AMOUNT=0;

DECLARE HASH HA(DATASET:'SASDATA.Q1Q2_SALES WHERE=(MONETARY_AMOUNT>0))');

HA.DEFINEKEY('OPERATOR','JOURNAL','LINE__','UNIT','ACCT','DEPARTMENT','DATE','JOURNAL_LINE_DESCRIPTION','JE_ID');

HA.DEFINEDATA(ALL:'Y');

HA.DEFINEDONE();

DO UNTIL(DONE);

SET SASDATA.Q1Q2_SALES END=DONE;

BY OPERATOR JOURNAL LINE__ UNIT ACCT DEPARTMENT MONETARY_AMOUNT DATE JOURNAL_LINE_DESCRIPTION JE_ID;

IF FIRST.DEPARTMENT AND HA.FIND()=0 AND SUM(OF MONETARY_AMOUNT)>0 THEN OUTPUT;

IF LAST.DEPARTMENT AND LAST.JOURNAL_LINE_DESCRIPTION AND NOT(FIRST.JOURNAL_LINE_DESCRIPTION) THEN OUTPUT;

END;

STOP;

RUN;

Log:

304

305 PROC SORT DATA = SASDATA.Q1Q2_SALES;

306 BY _ALL_;

307 RUN;

NOTE: There were 206 observations read from the data set SASDATA.Q1Q2_SALES.

NOTE: The data set SASDATA.Q1Q2_SALES has 206 observations and 11 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

308

309 DATA SASDATA.Z_Q1Q2_SALES;

310 IF 0 THEN MONETARY_AMOUNT=0;

311 DECLARE HASH HA(DATASET:'SASDATA.Q1Q2_SALES WHERE=(MONETARY_AMOUNT>0))');

312 HA.DEFINEKEY('OPERATOR','JOURNAL','LINE__','UNIT','ACCT','DEPARTMENT','DATE','JOURNAL_LINE_DESCRI

312! PTION','JE_ID');

313 HA.DEFINEDATA(ALL:'Y');

314 HA.DEFINEDONE();

315

316

317 DO UNTIL(DONE);

318 SET SASDATA.Q1Q2_SALES END=DONE;

319 BY OPERATOR JOURNAL LINE__ UNIT ACCT DEPARTMENT MONETARY_AMOUNT DATE JOURNAL_LINE_DESCRIPTION

319! JE_ID;

320 IF FIRST.DEPARTMENT AND HA.FIND()=0 AND SUM(OF MONETARY_AMOUNT)>0 THEN OUTPUT;

321 IF LAST.DEPARTMENT AND LAST.JOURNAL_LINE_DESCRIPTION AND NOT(FIRST.JOURNAL_LINE_DESCRIPTION) THEN

321! OUTPUT;

322 END;

323 STOP;

324 RUN;

ERROR: Invalid data set name at line 313 column 1.

ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 1 observations read from the data set SASDATA.Q1Q2_SALES.

WARNING: The data set SASDATA.Z_Q1Q2_SALES may be incomplete. When this step was stopped there were

0 observations and 11 variables.

WARNING: Data set SASDATA.Z_Q1Q2_SALES was not replaced because this step was stopped.

NOTE: DATA statement used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

Cant figure whats going wrong, please help

art297 · Posted 11-09-2011 05:17 PM

Akber, I think that you've confused matters by changing the variable names and now having one more variable than you showed in your example data.

Regardless, I still think that the methodology I proposed the other day will still be the easiest for you to manage (although it might not be the most efficient code). I may have some of the variables or ordering wrong, but doesn't the following do what you want?:

PROC SORT DATA = Q1Q2_SALES;

BY OPERATOR JOURNAL LINE__ UNIT ACCT DEPARTMENT

DATE MONETARY_AMOUNT

JOURNAL_LINE_DESCRIPTION;

RUN;

data Z_Q1Q2_SALES;

retain delete_me;

if _n_ eq 1 then delete_me=0;

set Q1Q2_SALES;

by OPERATOR JOURNAL LINE__ UNIT ACCT DEPARTMENT

DATE JOURNAL_LINE_DESCRIPTION;

set Q1Q2_SALES ( firstobs = 2 keep = MONETARY_AMOUNT

rename = (MONETARY_AMOUNT = Next_MONETARY_AMOUNT) )

Q1Q2_SALES ( obs = 1 drop = _all_ );

if delete_me then do;

output;

delete_me=0;

end;

else if not last.journal_line_description then do;

if sum(monetary_amount,next_monetary_amount) eq 0 then do;

delete_me=1;

output;

end;

else output;

end;

else output;

run;

data Z_Q1Q2_SALES;

set Z_Q1Q2_SALES;

if delete_me eq 0;

run;

FriedEgg · Posted 11-09-2011 05:59 PM

It looks like you left out a number of details when transcribing my code.

data have;

input operator :$10. (journal line__ unit acct department) (:12.) monetary_amount :comma9.2 date :yymmdd10. journal_line_description &:$15. je_id;

cards;

ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid 1

ABCD 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid 1

ABCD 2345 1 1203 672 030 1,234.56 2011-01-31 Fees Paid 1

ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid 1

WXYZ 2323 1 3212 672 020 6,543.89 2011-02-28 Allowance 1

ERDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid 1

ERDS 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid 1

QRDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid 1

;

run;

proc sort data=have; by operator journal line__ unit acct department monetary_amount date journal_line_description; run;

data check;

if 0 then __dllr=0;

declare hash ha(dataset:'have(rename=(monetary_amount=__dllr) where=(__dllr>0))');

ha.definekey('journal','line__','unit','acct','department');

ha.definedata('__dllr');

ha.definedone();

do until(done);

set have end=done;

by operator journal line__ unit acct department monetary_amount date journal_line_description;

if first.department and ha.find()=0 and sum(of monetary_amount __dllr)>0 then output;

if last.department and last.journal_line_description and not(first.journal_line_description) then output;

end;

stop;

drop __:;

run;

options nodate;

proc print data=have noobs; title 'INPUT DATA'; run;

proc print data=check noobs; title 'OUTPUT DATA'; run;

monetary_ journal_line_

operator journal line__ unit acct department amount date description je_id

ABCD 1234 1 1202 671 20 -1234.56 18658 Fees Paid 1

ABCD 1234 1 1202 671 20 1234.56 18658 Fees Paid 1

ABCD 2345 1 1203 672 30 1234.56 18658 Fees Paid 1

ERDS 1234 1 1202 671 20 -1234.56 18658 Fees Paid 1

ERDS 1234 1 1202 671 20 1234.56 18658 Fees Paid 1

QRDS 1234 1 1202 671 20 1234.56 18658 Fees Paid 1

WXYZ 2323 1 3212 672 20 6543.89 18686 Allowance 1

monetary_ journal_line_

operator journal line__ unit acct department amount date description je_id

ABCD 1234 1 1202 671 20 1234.56 18658 Fees Paid 1

ABCD 2345 1 1203 672 30 1234.56 18658 Fees Paid 1

QRDS 1234 1 1202 671 20 1234.56 18658 Fees Paid 1

WXYZ 2323 1 3212 672 20 6543.89 18686 Allowance 1

Ksharp · Posted 11-10-2011 01:58 AM

How about:

And Thanks Tom, using something learned from you!

data have;
  input slug :$10. (gar h3432 fing up99 thisaw) (:12.) _dllr :comma9.2 dte :yymmdd10. note &:$15. ;
  format dte yymmdd10.;
  cards;
ABCD  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid
ABCD  1234  1  1202  671  020 -1,234.56   2011-01-31  Fees Paid
ABCD  2345  1  1203  672  030  1,234.56   2011-01-31  Fees Paid
ABCD  1234  1  1202  671  020  1,234.56   2011-01-31  Fees Paid
WXYZ  2323  1  3212  672  020  6,543.89   2011-02-28  Allowance
;
run;
data positive negative;
 set have;
 if _dllr lt 0 then do;_dllr=abs(_dllr);output negative;end;
  else output positive;
run;
proc sort data=positive;by _dllr;run;
proc sort data=negative;by _dllr;run;
data want;
 ina=0;inb=0;
 merge positive(in=ina) negative(in=inb);
 by _dllr;
 if not ina or not inb then do; 
                              if not ina then _dllr=_dllr*-1;
                              output;
                            end;
run;

Ksharp

MikeZdeb · Posted 11-10-2011 12:14 PM

hi ... if the goal is " ... simply remove TWO observations with equal Amount with a different sign, based just on amount and no other field ..." , I think this also works (using Fried Egg's data and idea similar to Ksharp's matching positive and negative values of _dllr) ... ...

data have;

input slug :$10. (gar h3432 fing up99 thisaw) (:12.) _dllr :comma9.2 dte :yymmdd10. note &:$15. ;

cards;

ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid

ABCD 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid

ABCD 2345 1 1203 672 030 1,234.56 2011-01-31 Fees Paid

ABCD 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid

WXYZ 2323 1 3212 672 020 6,543.89 2011-02-28 Allowance

ERDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid

ERDS 1234 1 1202 671 020 -1,234.56 2011-01-31 Fees Paid

QRDS 1234 1 1202 671 020 1,234.56 2011-01-31 Fees Paid

;

run;

proc sort data=have;

by _dllr;

run;

* negative _dllr;

data neg (index=(_dllr));

set have (keep=_dllr where=(_dllr lt 0));

_dllr = abs(_dllr);

run;

* positive _dllr ... check for matches among negative _dllr, if no match or out of matches ... output;

data want;

set have (where=(_dllr gt 0));

set neg key=_dllr;

if _error_;

_error_ = 0;

run;

ps given all the various responses, I'm not sure if this is still the problem ... if it involves looking for negative values plus matching on other variables, just change the index

Removing both observations in a duplicate

Re: Removing both observations in a duplicate

Re: Removing both observations in a duplicate

Re: Removing both observations in a duplicate

Removing both observations in a duplicate

Removing both observations in a duplicate

Removing both observations in a duplicate

Removing both observations in a duplicate

Re: Removing both observations in a duplicate

Re: Removing both observations in a duplicate

Re: Removing both observations in a duplicate

Removing both observations in a duplicate

Removing both observations in a duplicate

Removing both observations in a duplicate

Registration is open

SAS Training: Just a Click Away