I have about 200,000 records of cases and I want to identify the true consecutive Negative result (can't have Positive result after it).
Here's some examples:
A: Keep only the result after first two positive (Obs 3-5)
B: Obs 6-7 need to be drop because these is a Positive after these Negative.
C: Need to be drop
😧 Only 15-17 can be keep since 13 have Positive after them as well.
The basic concept is to keep the Negative results after the last positive for each case (group?) and give them a seq# if possible.
Thank you so much for your help. I tried to add seq# but many times it starting with 2 or even 4 when I use
if first.Case then Seq+1;
Obs | Case | Test Date | Test Result | Record Wanted | Seq# |
1 | A | 01/01/2020 | Missing | ||
2 | A | 01/02/2021 | Positive | ||
3 | A | 01/03/2021 | Negative | V | 1 |
4 | A | 01/04/2021 | Negative | V | 2 |
5 | A | 01/05/2021 | Negative | V | 3 |
6 | B | 01/01/2020 | Negative | ||
7 | B | 01/02/2021 | Negative | ||
8 | B | 01/03/2021 | Positive | ||
9 | C | 01/02/2021 | Missing | ||
10 | D | 01/01/2020 | Positive | ||
11 | D | 01/02/2021 | Positive | ||
12 | D | 01/03/2021 | Missing | ||
13 | D | 01/04/2021 | Negative | ||
14 | D | 01/01/2021 | Positive | ||
15 | D | 01/02/2022 | Negative | V | 1 |
16 | D | 01/03/2022 | Negative | V | 2 |
17 | D | 01/04/2022 | Negative | V | 3 |
Welcome to the SAS Communities.
You need to read the source table twice because you can only figure out which rows to select once you've processed the last row within a group.
Below code should do what you're after.
data want;
set have;
by notsorted case;
retain start_row n_row;
if first.case then call missing(start_row, n_row);
if Test_Result ne 'Negative' then call missing(start_row, n_row);
else if missing(start_row) then start_row=_n_;
if not missing(start_row) then n_row+1;
if last.case and not missing(start_row) then
do;
stop_row=start_row+n_row-1;
do i=start_row to stop_row;
set have point=i;
output;
end;
end;
/* drop start_row stop_row n_row i;*/
run;
proc print data=want;
run;
I got something like below but it's not working expected.
data Test_Negative;
set Tests;
by Case;
if Test_Result^="Negative" then T_Seq=0;
else if Test_Result="Negative" then T_Seq+1;
if first.Case & Test_Result^="Negative" then T_Seq=0;
else if first.Case & Test_Result="Negative" then T_Seq+1;
run;
Welcome to the SAS Communities.
You need to read the source table twice because you can only figure out which rows to select once you've processed the last row within a group.
Below code should do what you're after.
data want;
set have;
by notsorted case;
retain start_row n_row;
if first.case then call missing(start_row, n_row);
if Test_Result ne 'Negative' then call missing(start_row, n_row);
else if missing(start_row) then start_row=_n_;
if not missing(start_row) then n_row+1;
if last.case and not missing(start_row) then
do;
stop_row=start_row+n_row-1;
do i=start_row to stop_row;
set have point=i;
output;
end;
end;
/* drop start_row stop_row n_row i;*/
run;
proc print data=want;
run;
A double do until() comes in handy for this task:
data have;
input Obs Case :$1. Date :mmddyy10. Result :$12.;
format date yymmdd10.;
datalines;
1 A 01/01/2020 Missing
2 A 01/02/2021 Positive
3 A 01/03/2021 Negative V 1
4 A 01/04/2021 Negative V 2
5 A 01/05/2021 Negative V 3
6 B 01/01/2020 Negative
7 B 01/02/2021 Negative
8 B 01/03/2021 Positive
9 C 01/02/2021 Missing
10 D 01/01/2020 Positive
11 D 01/02/2021 Positive
12 D 01/03/2021 Missing
13 D 01/04/2021 Negative
14 D 01/01/2021 Positive
15 D 01/02/2022 Negative V 1
16 D 01/03/2022 Negative V 2
17 D 01/04/2022 Negative V 3
;
data want;
do n = 1 by 1 until (last.case);
set have; by case;
if result = "Negative" then do;
if missing(firstPos) then firstPos = n;
lastPos = n;
end;
else call missing(firstPos);
end;
seq = 0;
do n = 1 by 1 until (last.case);
set have; by case;
if firstPos > 0 then
if lastPos > firstPos + 1 and n >= firstPos then do;
seq + 1;
output;
end;
end;
drop n firstPos lastPos;
run;
data have;
input Obs Case :$1. Date :mmddyy10. Result :$12.;
format date yymmdd10.;
datalines;
1 A 01/01/2020 Missing
2 A 01/02/2021 Positive
3 A 01/03/2021 Negative V 1
4 A 01/04/2021 Negative V 2
5 A 01/05/2021 Negative V 3
6 B 01/01/2020 Negative
7 B 01/02/2021 Negative
8 B 01/03/2021 Positive
9 C 01/02/2021 Missing
10 D 01/01/2020 Positive
11 D 01/02/2021 Positive
12 D 01/03/2021 Missing
13 D 01/04/2021 Negative
14 D 01/01/2021 Positive
15 D 01/02/2022 Negative V 1
16 D 01/03/2022 Negative V 2
17 D 01/04/2022 Negative V 3
;
data want;
do i=1 by 1 until(last.case);
set have;
by case ;
if result='Positive' then _i=i;
end;
do i=1 by 1 until(last.case);
set have;
by case result notsorted;
if first.result then count=.;
count+1;
if i le _i or result ne 'Negative' then call missing(count);
output;
end;
drop i _i;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.