I have a data like this:
ID | Application_No | Version | Action | Date |
---|---|---|---|---|
1 | ABCDEFX2 | 1 | Submit For Approval | 01 July 2014 |
2 | ABCDEFX2 | 1 | Return | 02 July 2014 |
3 | ABCDEFX2 | 2 | Submit For Approval | 01 July 2014 |
4 | ABCDEFX2 | 2 | Return | 02 July 2014 |
5 | ABCDEFX2 | 2 | Submit For Approval | 05 July 2014 |
6 | ABCDEFX2 | 2 | Approve | 06 July 2014 |
This table is a transaction trail. The system, on new version, will copy over the previous trail.
For example, on row no. 3 a new version of the same application is created. And Row1 & 2 is copied as Row3 & 4 and trail started over.
How do I delete row no 3 and 4?
Row 3 and 4 is copy of Row 1 and 2 respectively.
A friend suggested using SELECT DISTINCT but have to remove column VERSION.
Yes, it may then be a bit more tricky than a sort/nodupkey as you need to check previous versions. The following should do that:
data have;
length application_no action date $50;
infile datalines dlm=",";
input ID Application_No $ Version Action $ Date $;
datalines;
1,ABCDEFX2,1,Submit For Approval,01July2014
2,ABCDEFX2,1,Return,02July2014
3,ABCDEFX2,2,Submit For Approval,01July2014
4,ABCDEFX2,2,Return,02July2014
5,ABCDEFX2,2,Submit For Approval,05July2014
6,ABCDEFX2,2,Approve,06July2014
run;
proc sql;
create table WANT as
select BASE.*
from WORK.HAVE BASE
where not exists(select distinct THIS.APPLICATION_NO
from WORK.HAVE THIS
where THIS.APPLICATION_NO=BASE.APPLICATION_NO
and THIS.ACTION=BASE.ACTION
and THIS.DATE=BASE.DATE
and THIS.VERSION < BASE.VERSION);
quit;
The key I see is application_no, Action and Date.
So PROC SORT NODUPKEY would probable do the work.
The key is Application_No and Version.
Therefore I need to retain row 1 and 2 for version 1, and row 5 and 6 for version 2.
Yes, it may then be a bit more tricky than a sort/nodupkey as you need to check previous versions. The following should do that:
data have;
length application_no action date $50;
infile datalines dlm=",";
input ID Application_No $ Version Action $ Date $;
datalines;
1,ABCDEFX2,1,Submit For Approval,01July2014
2,ABCDEFX2,1,Return,02July2014
3,ABCDEFX2,2,Submit For Approval,01July2014
4,ABCDEFX2,2,Return,02July2014
5,ABCDEFX2,2,Submit For Approval,05July2014
6,ABCDEFX2,2,Approve,06July2014
run;
proc sql;
create table WANT as
select BASE.*
from WORK.HAVE BASE
where not exists(select distinct THIS.APPLICATION_NO
from WORK.HAVE THIS
where THIS.APPLICATION_NO=BASE.APPLICATION_NO
and THIS.ACTION=BASE.ACTION
and THIS.DATE=BASE.DATE
and THIS.VERSION < BASE.VERSION);
quit;
Thanks, this solution worked for me.
Hello,
Maybe this:
data have;
infile datalines truncover;
input ID Application_No $ Version Action & :$22. Date $12.;
datalines;
1 ABCDEFX2 1 Submit For Approval 01 July 2014
2 ABCDEFX2 1 Return 02 July 2014
3 ABCDEFX2 2 Submit For Approval 01 July 2014
4 ABCDEFX2 2 Return 02 July 2014
5 ABCDEFX2 2 Submit For Approval 05 July 2014
6 ABCDEFX2 2 Approve 06 July 2014
7 ABCDEFX2 3 Submit For Approval 05 July 2014
8 ABCDEFX2 3 Approve 06 July 2014
9 ABCDEFX2 3 Approve 05 July 2014
10 ABCDEFX2 3 Financed 06 July 2014
;
proc sort data=have;
by Application_No date Action version;
run;
data want;
set have;
by Application_No date Action ;
if first.action;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.