Add & Delete the Duplicate records

Solved
Occasional Contributor
Posts: 8

Add & Delete the Duplicate records

Hi Team,

I have a sas dataset with around 30 variables, Out of which mainly i have concentrate on two variables shown below

POL_ID     Amount

-----------------------------------

AZN123     \$25

ABN456     \$44

AZP123     \$50

ABC456     \$20

CRG020     \$60

AZC123     \$15

ABP456     \$10

NYP020     \$50

HRG090     \$10

Conditions:

1) Eliminate Duplicate records

a)Eliminate records with AZN and retain the record with AZC.

b) Eliminate records with AZP and retain the record with AZC.

c) Eliminate records with ABN and retain the record with ABC

d) Eliminate records with ABP and retain the record with ABC

Combine the Amount from the A_N and A_P records with the A_C record.

Note: S/w: SAS9.3, O/S: Linux

Expected Output(order doesn't matter):

POL_ID     Amount

-------------------------------------

AZC123     \$90

ABC456     \$74

NY020       \$50

CRG020     \$60

HRG090     \$10

Accepted Solutions
Solution
‎02-23-2015 05:05 PM
PROC Star
Posts: 8,169

Re: Add & Delete the Duplicate records

Avinash wrote me off line, but I still don't know if I understand what he is trying to accomplish. The following is my best guess for now:

Given the following data:

data have;

informat pol_id \$6.;

informat amount comma8.;

input POL_ID     Amount;

cards;

AZN123     \$25

ABN456     \$44

AZP123     \$50

ABC456     \$20

CRG020     \$60

AZC123     \$15

ABP456     \$10

NYP020     \$50

HRG090     \$10

AZN999     \$25

LLL999     \$30

;

/*collapse by pol_id*/

proc summary data=have nway;

var amount;

class pol_id;

output out=need (drop=_sum=;

run;

/*extract _id and _prefix*/

data need;

set need;

_prefix=substr(pol_id,1,3);

_id=substr(pol_id,4);

run;

/*sort by _id and _prefix*/

proc sort data=need;

by _id _prefix;

run;

/*change certain pol_ids if first record begins with AZC or ABC*/

data need;

set need;

by _id;

retain _changeit;

if first._id then do;

if _prefix in ('AZC','ABC') then _changeit=1;

else _changeit=0;

end;

else if _changeit then do;

pol_id=tranwrd(pol_id,'AZN','AZC');

pol_id=tranwrd(pol_id,'AZP','AZC');

pol_id=tranwrd(pol_id,'ABN','ABC');

pol_id=tranwrd(pol_id,'ABP','ABC');

end;

run;

/*Collapse the modified data*/

proc summary data=need nway;

var amount;

class pol_id;

output out=want (drop=_sum=;

run;

All Replies
Super Contributor
Posts: 286

Re: Add & Delete the Duplicate records

this will get rid of records with duplicate POL_ID:

proc sort data=mydata nodupkey;

by POL_ID;

run;

CAUTION: this will get rid of records that have the same POL_ID even if the amounts are different.  Is this what  you want to happen?

Occasional Contributor
Posts: 8

Re: Add & Delete the Duplicate records

Hi Gregg,

No, Thats not i am expecting,

What i am thinking is,

split the POL_ID in to two parts as prifex=first 3 char of POLID , ID=rest of the char (example prefix=AZN, ID=123)

proc sort data=inputdata;

by ID Prefix;

run;

Here AZC,AZN & AZP has same ID which is 123, Now Combine the Amount from the A_N and A_P records with the A_C record. and remove AZP & AZN records from the dataset, So that out of 3 only one records remains i.e AZC123 with total sum.

We can use the  FIRST.VAR and retain logic, Expecting the better approach from you all

Hope it give a clear picture, Let me know if you need any more information..

Super User
Posts: 23,778

Re: Add & Delete the Duplicate records

First.Var/Last.var and retain logic would be the solution I would recommend.

Because you have to split the ID and perform summation other procs will require pre-processing of the data which means multiple data passes. A data step would be a single pass with a pre-sort.

Occasional Contributor
Posts: 8

Re: Add & Delete the Duplicate records

Hi Team,

How to retain the record for multiple iteration,

For example:

ID     Prefix     Amount     V1      V2     V3     V4 ....... V30

123   ABC      10               q      w       r          t          y

123   ABN       20              a      s       d         f          g

123   ABP       30              z      x       c         v          b

The Required Output:

123    ABC  60     q   w   r    t   y

I need all the variable of first.ID wit Prefix=ABC wit sum of (ABC+ABN+ABP) and delete the  ABN and ABP record

PROC Star
Posts: 8,169

Re: Add & Delete the Duplicate records

Your directions aren't very clear to me. Looks like you simply want to recode some values and then summarize the data to the pol_id level.

If that is what you are trying to do, then here is one way:

data need;

set have;

pol_id=tranwrd(pol_id,'AZN','AZC');

pol_id=tranwrd(pol_id,'AZP','AZC');

pol_id=tranwrd(pol_id,'ABN','ABC');

pol_id=tranwrd(pol_id,'ABP','ABC');

run;

proc summary data=need nway;

var amount;

class pol_id;

output out=want (drop=_ sum=;

run;

Solution
‎02-23-2015 05:05 PM
PROC Star
Posts: 8,169

Re: Add & Delete the Duplicate records

Avinash wrote me off line, but I still don't know if I understand what he is trying to accomplish. The following is my best guess for now:

Given the following data:

data have;

informat pol_id \$6.;

informat amount comma8.;

input POL_ID     Amount;

cards;

AZN123     \$25

ABN456     \$44

AZP123     \$50

ABC456     \$20

CRG020     \$60

AZC123     \$15

ABP456     \$10

NYP020     \$50

HRG090     \$10

AZN999     \$25

LLL999     \$30

;

/*collapse by pol_id*/

proc summary data=have nway;

var amount;

class pol_id;

output out=need (drop=_sum=;

run;

/*extract _id and _prefix*/

data need;

set need;

_prefix=substr(pol_id,1,3);

_id=substr(pol_id,4);

run;

/*sort by _id and _prefix*/

proc sort data=need;

by _id _prefix;

run;

/*change certain pol_ids if first record begins with AZC or ABC*/

data need;

set need;

by _id;

retain _changeit;

if first._id then do;

if _prefix in ('AZC','ABC') then _changeit=1;

else _changeit=0;

end;

else if _changeit then do;

pol_id=tranwrd(pol_id,'AZN','AZC');

pol_id=tranwrd(pol_id,'AZP','AZC');

pol_id=tranwrd(pol_id,'ABN','ABC');

pol_id=tranwrd(pol_id,'ABP','ABC');

end;

run;

/*Collapse the modified data*/

proc summary data=need nway;

var amount;

class pol_id;

output out=want (drop=_sum=;

run;

Occasional Contributor
Posts: 8

Re: Add & Delete the Duplicate records

BangON,

The second part with _id and _prefix is the exact solution i was looking for.

Thank you once again.

Occasional Contributor
Posts: 8

Re: Add & Delete the Duplicate records

HI Team,

As per Arthur's solutions the below code will fetch what i want

/*Collapse the modified data*/

proc summary data=need nway;

var amount;

class pol_id;

output out=want (drop=_sum=;

run;

But the output contains only pol_id and amount variable,

I would like to have all the variable

🔒 This topic is solved and locked.