Desktop productivity for business analysts and programmers

EG 4.1 find a previous record

Reply
N/A
Posts: 0

EG 4.1 find a previous record

I am looking to find instances within a data set where a given contract's status has changed to activ more than once.
I have looked at the LAG function and I do not think it will help with the multiple records for each contract. Any thoughts?

contract last_updt status proc_dt Seq_num amt
abc123 12-Nov-05 PEND 15-Nov-05 1 5478
abc123 12-Nov-05 PEND 16-Nov-05 2 5478
abc123 18-Nov-05 PEND 22-Nov-05 3 5478
abc123 18-Nov-05 PEND 23-Nov-05 4 5478
abc123 24-Nov-05 PEND 29-Nov-05 5 5478
abc123 24-Nov-05 PEND 30-Nov-05 6 5478
abc123 2-Dec-05 TERM 6-Dec-05 7 5478
abc123 2-Dec-05 TERM 7-Dec-05 8 5478
abc123 29-Dec-05 TERM 4-Jan-06 9 5478
abc123 29-Dec-05 TERM 4-Jan-06 10 5478
abc123 3-Feb-06 PEND 7-Feb-06 11 5478
abc123 3-Feb-06 PEND 8-Feb-06 12 5478
abc123 7-Feb-06 ACTIV 14-Feb-06 13 5478
abc123 7-Feb-06 ACTIV 15-Feb-06 14 5478
abc123 24-Feb-06 ACTIV 28-Feb-06 15 5478
abc123 24-Feb-06 ACTIV 1-Mar-06 16 5478
abc123 24-Mar-06 ACTIV 28-Mar-06 17 5478
abc123 24-Mar-06 ACTIV 29-Mar-06 18 5478
abc123 31-Mar-06 ACTIV 4-Apr-06 19 5478
abc123 31-Mar-06 ACTIV 5-Apr-06 20 5478
abc123 7-Apr-06 ACTIV 11-Apr-06 21 5478
abc123 7-Apr-06 ACTIV 12-Apr-06 22 5478
abc123 12-Apr-06 ACTIV 18-Apr-06 23 5478
abc123 12-Apr-06 ACTIV 19-Apr-06 24 5478
abc123 21-Apr-06 ACTIV 25-Apr-06 25 5478
abc123 21-Apr-06 ACTIV 26-Apr-06 26 5478
abc123 30-Apr-06 TERM 2-May-06 27 5478
abc123 30-Apr-06 TERM 3-May-06 28 5478
abc123 7-May-06 TERM 9-May-06 29 5478
abc123 7-May-06 TERM 10-May-06 30 5478
abc123 12-May-06 ACTIV 16-May-06 31 5478
abc123 12-May-06 ACTIV 17-May-06 32 5478
abc123 23-Jan-07 ACTIV 29-Jan-07 33 5478
abc123 23-Jan-07 ACTIV 30-Jan-07 34 5478
SAS Super FREQ
Posts: 8,721

Re: EG 4.1 find a previous record

Hi:
I think that the LAG function won't help you much. One approach (assuming you want all the rows for the folks who have an ACTIV status more than once) is to make 2 passes through the file.
[pre]
** make some data;
** only 10 rows for abc123 contract;
** only 3 rows for xxx333 contract;
** only 4 rows for yyy444 contract;
data contract;
infile datalines;
input contract $ last_updt : anydtdte9. status $ proc_dt : anydtdte9. Seq_num amt;
format last_updt proc_dt mmddyy10.;
return;
datalines;
abc123 12-Nov-05 PEND 15-Nov-05 1 5478
abc123 12-Nov-05 PEND 16-Nov-05 2 5478
abc123 2-Dec-05 TERM 6-Dec-05 3 5478
abc123 2-Dec-05 TERM 7-Dec-05 4 5478
abc123 3-Feb-06 PEND 7-Feb-06 5 5478
abc123 3-Feb-06 PEND 8-Feb-06 6 5478
abc123 7-Feb-06 ACTIV 14-Feb-06 7 5478
abc123 7-Feb-06 ACTIV 15-Feb-06 8 5478
abc123 24-Feb-06 ACTIV 28-Feb-06 9 5478
abc123 7-Apr-06 ACTIV 11-Apr-06 10 5478
abc123 7-Apr-06 ACTIV 12-Apr-06 11 5478
abc123 12-Apr-06 ACTIV 18-Apr-06 12 5478
abc123 12-Apr-06 ACTIV 19-Apr-06 13 5478
abc123 21-Apr-06 ACTIV 25-Apr-06 14 5478
abc123 21-Apr-06 ACTIV 26-Apr-06 15 5478
abc123 30-Apr-06 TERM 2-May-06 16 5478
abc123 30-Apr-06 TERM 3-May-06 17 5478
abc123 7-May-06 TERM 9-May-06 18 5478
abc123 7-May-06 TERM 10-May-06 19 5478
abc123 23-Jan-07 ACTIV 29-Jan-07 20 5478
xxx333 23-Jan-04 PEND 24-Jan-04 1 3333
xxx333 31-Jan-04 ACTIV 01-Feb-04 2 3333
xxx333 31-Dec-06 TERM 01-Jan-07 3 3333
yyy444 23-Jan-04 PEND 24-Jan-04 1 4444
yyy444 31-Jan-04 ACTIV 01-Feb-04 2 4444
yyy444 31-Dec-06 TERM 01-Jan-07 3 4444
yyy444 02-Feb-07 ACTIV 03-Feb-07 4 4444
;
run;

proc freq data=contract;
title 'Number of ACTIV rows by contract';
where status='ACTIV';
tables contract;
run;

proc tabulate data=contract f=comma6.;
title 'Number of all status values by contract';
class contract status;
table contract, status*n='';
run;

** now do an SQL query to get the counts of ACTIV into a file;
proc sql;
create table work.cntactv as
select contract, count(*) as cnter
from work.contract
where status='ACTIV'
group by contract
having calculated cnter gt 1;
quit;

proc print data=work.cntactv;
title 'Showing only contracts with more than one status of ACTIV';
title2 'note that the xxx333 record does not appear because it only had 1 ACTIV';
run;

proc sql;
create table work.activonly as
select main.contract, last_updt, status, proc_dt, Seq_num, amt, count.cnter
from work.contract as main,
work.cntactv as count
where main.contract=count.contract and
status = 'ACTIV';
quit;

proc print data=work.activonly;
title 'Join of "counter" table with "main" original contract table';
run;
title;
[/pre]
I made some test data (using fewer rows than you originally posted) and then I added the xxx333 contract where ACTIV only appeared 1 time and then also added the yyy444 contract. If all you want is the status values for all the contract numbers in question, then either the PROC FREQ or the PROC TABULATE will give you that info.

Otherwise, the first PROC SQL creates a work table that shows ONLY the contracts where the count for ACTIV was greater than 1. And if all you want is that list, the PROC PRINT shows you that list.

If you want to get ALL the rows for the contracts where the COUNT for ACTIV status was greater than 1, the second PROC SQL joins the original/"main" table with the "counter" table. You could probably do this SQL join in one step but I broke it into 2 steps to show how the first pass tells you which contracts meet your condition (which may be all you need), while the second pass actually gets you all the rows that met the condition.

There are probably other approaches, but I don't think LAG will work for you because in a DATA step, when you're doing a sequential read, the previous records will be gone from the program data vector by the time you determine that the contract meets your criteria.

cynthia
Ask a Question
Discussion stats
  • 1 reply
  • 106 views
  • 0 likes
  • 2 in conversation