Solved
Contributor
Posts: 37

# Removing both observations in a duplicate

Hi,

I want to remove two instances of a complete duplicate observation. I mean using nodup I can remove one observation, how can I remove both of them, can I write a condition?

Example of data:

ABCD  1234  1  1202  671  020       1,234.56   2011-01-31  Fees Paid

ABCD  1234  1  1202  671  020       1,234.56   2011-01-31  Fees Paid

WXYZ  2323  1  3212  672  020       6,543.89   2011-02-28  Allowance

I want to remove both of the first two lines which duplicate each other. I am just not able to figure out the logic. I have 9 variables in the data. What I am trying to do is remove equal Amount of debit and a credit of a transaction. So it should not remove all the duplicates but any two perfect duplicates throughout the data.Please help.

--Akber.

Accepted Solutions
Solution
‎11-07-2011 11:27 AM
PROC Star
Posts: 8,165

## Re: Removing both observations in a duplicate

A more efficient way, I think, would be to remove the duplicates in a datastep.  e.g.:

options datestyle=ymd;

data have;

informat var1 \$10.;

informat var2-var6 12.;

informat var7 comma12.;

informat var8 anydtdte10.;

informat var9 \$15.;

input var1-var8 var9 &;

cards;

ABCD  1234  1  1202  671  020       1,234.56   2011-01-31  Fees Paid

ABCD  1234  1  1202  671  020       1,234.56   2011-01-31  Fees Paid

WXYZ  2323  1  3212  672  020       6,543.89   2011-02-28  Allowance

;

proc sort data=have;

by var1-var9;

run;

data want;

set have;

by var1-var9;

if first.var9 and last.var9;

run;

All Replies
Posts: 1,318

## Removing both observations in a duplicate

There is probably a more efficient way to do this but here is one way...  Use proc sort with dupout option and nodupkey, merge the dupout file back to the original data to remove the blanks.

data foo;

*this contains my data;

run;

proc sort data=foo out=bar dupout=drops nodupkey; by id trans_id; run;

data bar;

merge drops(in=b) bar(in=a);

by id trans_id;

if a and not b;

run;

Solution
‎11-07-2011 11:27 AM
PROC Star
Posts: 8,165

## Re: Removing both observations in a duplicate

A more efficient way, I think, would be to remove the duplicates in a datastep.  e.g.:

options datestyle=ymd;

data have;

informat var1 \$10.;

informat var2-var6 12.;

informat var7 comma12.;

informat var8 anydtdte10.;

informat var9 \$15.;

input var1-var8 var9 &;

cards;

ABCD  1234  1  1202  671  020       1,234.56   2011-01-31  Fees Paid

ABCD  1234  1  1202  671  020       1,234.56   2011-01-31  Fees Paid

WXYZ  2323  1  3212  672  020       6,543.89   2011-02-28  Allowance

;

proc sort data=have;

by var1-var9;

run;

data want;

set have;

by var1-var9;

if first.var9 and last.var9;

run;

Valued Guide
Posts: 765

## Re: Removing both observations in a duplicate

Hi ... another idea using Art's data ...

proc sql;

create table want as

select *

from have

group by var1,var2,var3,var4,var5,var6,var7,var8,var9

having count(*) eq 1;

quit;

Super User
Posts: 10,784

## Removing both observations in a duplicate

Another way is SQL. If you like:

```options datestyle ymd;
data have;
informat var1 \$10.;
informat var2-var6 12.;
informat var7 comma12.;
informat var8 anydtdte10.;
informat var9 \$15.;
input var1-var8 var9 &;
cards;
ABCD  1234  1  1202  671  020       1,234.56   2011-01-31  Fees Paid
ABCD  1234  1  1202  671  020       1,234.56   2011-01-31  Fees Paid
WXYZ  2323  1  3212  672  020       6,543.89   2011-02-28  Allowance
;
run;
proc sql;
create table want as
select *
from have
group by var1,var2,var3,var4,var5,var6,var7,var8,var9
having count(*) eq 1;
quit;
```

Ksharp

PROC Star
Posts: 8,165

## Re: Removing both observations in a duplicate

If we're going to go a proc sql route I'd use:

proc sql noprint;

select name into :vars separated by ","

from dictionary.columns

where libname eq "WORK"

and memname eq "HAVE"

;

create table want as

select *

from have

group by &vars.

having count(*) eq 1

;

quit;

Super User
Posts: 10,784

## Re: Removing both observations in a duplicate

En. It is a better way to save typing.

Valued Guide
Posts: 765

## Re: Removing both observations in a duplicate

hi ... depends, in this case ...

94 characters

select name into :vars separated by ","

from dictionary.columns

where libname eq "WORK"

and memname eq "HAVE"

;

(plus &vars = 99)

44 characters

var1,var2,var3,var4,var5,var6,var7,var8,var9

ps  the "when to hold 'em versus when to fold 'em" rule

PROC Star
Posts: 8,165

## Re: Removing both observations in a duplicate

and/or the "when to make use of the SAS Macro/Abbreviations rule that advises to assign a hotkey to quickly insert frequently used code snippets"

Contributor
Posts: 37

## Re: Removing both observations in a duplicate

Hi,

How can I simply remove TWO observations with equal Amount with a different sign, based just on amount and no other field?? Any two observations, no more no less.

ABCD  1234  1  1202  671  020       1,234.56   2011-01-31  Fees Paid

ABCD  1234  1  1202  671  020       -1,234.56   2011-01-31  Fees Paid

ABCD  2345  1  1203  672  030       1,234.56   2011-01-31  Fees Paid

ABCD  1234  1  1202  671  020       1,234.56   2011-01-31  Fees Paid

WXYZ  2323  1  3212  672  020       6,543.89   2011-02-28  Allowance

Like line 1 & 2 or line 2 & 3 or line 2 & 4 but not more than 2. I mean any two random lines which

Thanks,

Akber.

PROC Star
Posts: 8,165

## Re: Removing both observations in a duplicate

I'm not sure exactly what you are trying to do.  You can always use the abs() function to make all values positive in your comparisons.

Contributor
Posts: 37

## Removing both observations in a duplicate

yes but that would then remove any two transactions. I only want a positive and negative with the same value removed

say 1234 and 2345 is the reference# then how do I remove the first and second observation (both 1234), both of which will add to net out to zero, how can I delete them from my dataset

ABCD  1234  1  1202  671  020       1,234.56   2011-01-31  Fees Paid

ABCD  1234  2  1203  671  020       -1,234.56   2011-01-31  Fees Paid

ABCD  2345  1  1203  672  030       1,234.56   2011-01-31  Fees Paid

PROC Star
Posts: 8,165

## Removing both observations in a duplicate

You sort (order) the file, say with proc sql, by reference # and the abs() of the variable you want to use.  Then, you can use the look ahead-look back methodology to see if any adjacent values add to zero (see: http://www.sascommunity.org/wiki/Look-Ahead_and_Look-Back )

Super Contributor
Posts: 358

## Re: Removing both observations in a duplicate

Art:

If you are going with the Proc SQL then you may be limited buy the size of th :into var.

A large dataset could blow the variable size limit whereas the first. / last. on the sort won't.

Just sayin'

PROC Star
Posts: 8,165

## Re: Removing both observations in a duplicate

My original suggestion didn't use either proc sql or macro variables.  However, you are definitely correct, although that limit is now 65,534.  If one has a collection of variable names that exceed that number of characters, I definitely wouldn't suggest a method that requires manually entering them.

🔒 This topic is solved and locked.