Hi:
As the above example shows, you will have to essentiallly read the data twice to accomplish this task. I generally do not use/show SET inside a loop, because it is hard to explain to beginners -- however it can be a very efficient way to read and process large datasets. Alternately, if you used the LAG function, you could accomplish the same task in more steps, but possibly would end up with a program that was easier to understand.
If you use the LAG function to "look behind", you would get data as shown below (note that I created the LAG_STAT and the ORIG_ORD variables in addition to reading the original variables that you showed):
[pre]
Show how lag works -- but do not really need this dataset
Obs ID date status lag_stat orig_ord
1 1 01/01/10 1 . 1
2 1 01/05/10 5 1 2
3 2 01/01/10 2 . 3
4 2 01/03/10 1 2 4
5 2 01/04/10 5 1 5
6 2 01/05/10 9 5 6
7 3 01/03/10 1 . 7
8 3 01/04/10 3 1 8
9 3 01/05/10 5 3 9
10 4 01/01/10 3 . 10
11 4 01/02/10 1 3 11
12 4 01/08/10 7 1 12
13 4 01/10/10 5 7 13
14 5 01/03/10 1 . 14
15 5 01/06/10 7 1 15
16 5 01/08/10 4 7 16
[/pre]
If you look at the rows where LAG_STAT = 1, that means that the value for the PREVIOUS observation was 1 so if the current STATUS variable is equal to 5, you now know that you have an ID that you want. In the report above, you see this pattern in obs #2 for ID=1 and in obs#5 for ID=2. So, if you output ONLY the ID for the obs that meet this pattern, then you will have a dataset of the IDs you want.
In my program below, where the LAG function is used, I have 1 DATA step program that creates 2 datasets -- WANTID and LAGSTATUS. The dataset I really need to use is the WANTID dataset -- the LAGSTATUS dataset is just to get the above shot of how the logic works with LAG.
Next, the WANTID dataset is merged with the original dataset (I call the original dataset WORK.CK_PATTERN) -- you probably only want 1 dataset from the merge, but my program creates 3 -- 1 dataset to hold all the obs that DO meet the pattern, 1 dataset to hold all the obs that do NOT meet the pattern, and 1 dataset to hold the obs that are other than 1 or 5 for the IDs that meet the pattern (such as the obs with status=2 and status=9 for ID=2)
You can experiment with data_null_'s program and try to get it working with 2 set statements and the loop and flags or you can use a technique similar to the one I show in the program below.
cynthia
[pre]
** 1) make data;
data ck_pattern;
infile datalines dlm=' ';
input ID date :anydtdte8. status;
orig_ord=_n_;
format date mmddyy8.;
return;
datalines;
1 01-01-10 1
1 01-05-10 5
2 01-01-10 2
2 01-03-10 1
2 01-04-10 5
2 01-05-10 9
3 01-03-10 1
3 01-04-10 3
3 01-05-10 5
4 01-01-10 3
4 01-02-10 1
4 01-08-10 7
4 01-10-10 5
5 01-03-10 1
5 01-06-10 7
5 01-08-10 4
;
run;
proc sort data=ck_pattern;
by id orig_ord;
run;
ods listing;
proc print data=ck_pattern;
title 'Original Data';
run;
** 2) Use the LAG function to determine which IDs are wanted;
** only output ID when current row is 5 and prev row had status of 1;
** lag_stat is status value from prev row and status is value from current row;
** on first.id need to set lag_stat to missing -- because first.id has no previous row;
data wantid(keep=id)
lagstatus (keep=id date status lag_stat orig_ord);
set ck_pattern;
by id;
lag_stat = lag(status);
if first.id then lag_stat=.;
output lagstatus;
if lag_stat = 1 and status = 5 then output wantid;
run;
proc print data=lagstatus;
title 'Show how lag works -- but do not really need this dataset';
var id date status lag_stat orig_ord;
run;
proc print data=wantid;
title 'Just want these ids';
run;
** 3) do merge to get only IDs that meet pattern where orig obs has status of 1 or 5 and;
** put these obs into the dataset, HAVEPATTERN ;
** other datasets created are DONOTMEET -- for IDs that do not meet pattern at all;
** and EXTRAOBS for the other statuses from the IDs that do meet the pattern;
** you may want to just put EXTRAOBS rows into DONOTMEET -- not sure about this;
data havepattern donotmeet extraobs;
merge wantid(in=want)
ck_pattern(in=inorig);
by id;
if want=1 and inorig=1 then do;
if status in (1,5) then output havepattern;
else if status not in (1,5) then output extraobs;
end;
else if inorig=1 and want=0 then output donotmeet;
run;
proc print data=havepattern;
title 'IDs that Meet Pattern';
run;
proc print data=extraobs;
title 'IDs that meet Pattern but these status obs not needed';
run;
proc print data=donotmeet;
title 'Obs that Do Not Meet Pattern';
run;
[/pre]