DATA Step, Macro, Functions and more

Delete duplicate rows

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 92
Accepted Solution

Delete duplicate rows

I have a data like this:

IDApplication_NoVersionActionDate
1ABCDEFX21Submit For Approval01 July 2014
2ABCDEFX21Return02 July 2014
3ABCDEFX22Submit For Approval01 July 2014
4ABCDEFX22Return02 July 2014
5ABCDEFX22Submit For Approval05 July 2014
6ABCDEFX22Approve06 July 2014

This table is a transaction trail. The system, on new version, will copy over the previous trail.

For example, on row no. 3 a new version of the same application is created. And Row1 & 2 is copied as Row3 & 4 and trail started over.

How do I delete row no 3 and 4?

Row 3 and 4 is copy of Row 1 and 2 respectively.

A friend suggested using SELECT DISTINCT but have to remove column VERSION.


Accepted Solutions
Solution
‎09-03-2014 04:19 AM
Super User
Super User
Posts: 7,997

Re: Delete duplicate rows

Yes, it may then be a bit more tricky than a sort/nodupkey as you need to check previous versions.  The following should do that:

data have;
  length application_no action date $50;
  infile datalines dlm=",";
  input ID Application_No $ Version Action $ Date $;
datalines;
1,ABCDEFX2,1,Submit For Approval,01July2014
2,ABCDEFX2,1,Return,02July2014
3,ABCDEFX2,2,Submit For Approval,01July2014
4,ABCDEFX2,2,Return,02July2014
5,ABCDEFX2,2,Submit For Approval,05July2014
6,ABCDEFX2,2,Approve,06July2014
run;

proc sql;
  create table WANT as
  select  BASE.* 
  from    WORK.HAVE BASE
  where   not exists(select distinct THIS.APPLICATION_NO
                     from   WORK.HAVE THIS
                     where  THIS.APPLICATION_NO=BASE.APPLICATION_NO
                      and   THIS.ACTION=BASE.ACTION
                      and   THIS.DATE=BASE.DATE
                      and   THIS.VERSION < BASE.VERSION);
quit;

View solution in original post


All Replies
Super User
Posts: 5,441

Re: Delete duplicate rows

The key I see is application_no, Action and Date.

So PROC SORT NODUPKEY would probable do the work.

Data never sleeps
Frequent Contributor
Posts: 92

Re: Delete duplicate rows

The key is Application_No and Version.

Therefore I need to retain row 1 and 2 for version 1, and row 5 and 6 for version 2.

Solution
‎09-03-2014 04:19 AM
Super User
Super User
Posts: 7,997

Re: Delete duplicate rows

Yes, it may then be a bit more tricky than a sort/nodupkey as you need to check previous versions.  The following should do that:

data have;
  length application_no action date $50;
  infile datalines dlm=",";
  input ID Application_No $ Version Action $ Date $;
datalines;
1,ABCDEFX2,1,Submit For Approval,01July2014
2,ABCDEFX2,1,Return,02July2014
3,ABCDEFX2,2,Submit For Approval,01July2014
4,ABCDEFX2,2,Return,02July2014
5,ABCDEFX2,2,Submit For Approval,05July2014
6,ABCDEFX2,2,Approve,06July2014
run;

proc sql;
  create table WANT as
  select  BASE.* 
  from    WORK.HAVE BASE
  where   not exists(select distinct THIS.APPLICATION_NO
                     from   WORK.HAVE THIS
                     where  THIS.APPLICATION_NO=BASE.APPLICATION_NO
                      and   THIS.ACTION=BASE.ACTION
                      and   THIS.DATE=BASE.DATE
                      and   THIS.VERSION < BASE.VERSION);
quit;

Frequent Contributor
Posts: 92

Re: Delete duplicate rows

Thanks, this solution worked for me.

Super Contributor
Posts: 308

Re: Delete duplicate rows

Hello,

Maybe this:

data have;
infile datalines truncover;
input ID Application_No $ Version Action & :$22. Date $12.;
datalines;
1 ABCDEFX2 1 Submit For Approval  01 July 2014
2 ABCDEFX2 1 Return  02 July 2014
3 ABCDEFX2 2 Submit For Approval  01 July 2014
4 ABCDEFX2 2 Return  02 July 2014
5 ABCDEFX2 2 Submit For Approval  05 July 2014
6 ABCDEFX2 2 Approve  06 July 2014
7 ABCDEFX2 3 Submit For Approval  05 July 2014
8 ABCDEFX2 3 Approve  06 July 2014
9 ABCDEFX2 3 Approve  05 July 2014
10 ABCDEFX2 3 Financed  06 July 2014
;

proc sort data=have;
by Application_No date Action  version;
run;

data want;
set have;
by Application_No date Action ;
if first.action;
run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 330 views
  • 0 likes
  • 4 in conversation