I have test result data for patients. Once a patient tests positive (result=1) any future tests for that patient should be deleted from the file.
So I know I need to identify the record with result=1 and grab that date, but I'm at a loss for how to apply that date within patient ID to delete records with a higher date value.
Thanks in advance.
PATIENT_ID | _SPECIMEN_COLL_DT | RESULT | |
1007215 | 23-Sep-20 | 2 | |
1007215 | 25-Sep-20 | 2 | |
1007215 | 27-Sep-20 | 1 | |
1007215 | 1-Nov-20 | 2 | delete |
1007215 | 10-Nov-20 | 2 | delete |
1007215 | 14-Nov-20 | 2 | delete |
1007215 | 30-Nov-20 | 2 | delete |
1007215 | 6-Dec-20 | 9 | delete |
1342677 | 19-Oct-20 | 2 | |
1342677 | 7-Nov-20 | 2 | |
1345898 | 25-Aug-20 | 9 | |
1345898 | 27-Sep-20 | 2 | |
1345898 | 11-Oct-20 | 2 | |
1345898 | 23-Oct-20 | 1 | |
1345898 | 30-Oct-20 | 2 | delete |
1345898 | 6-Nov-20 | 2 | delete |
1345898 | 22-Nov-20 | 2 | delete |
1345898 | 29-Nov-20 | 2 | delete |
1345898 | 6-Dec-20 | 2 | delete |
1408252 | 21-Sep-20 | 2 | |
1439221 | 25-Oct-20 | 2 | |
1439245 | 8-Oct-20 | 2 | |
1442703 | 21-Oct-20 | 2 | |
1451575 | 24-Aug-20 | 2 | |
1451575 | 24-Nov-20 | 2 | |
1462769 | 9-Sep-20 | 2 | |
1469679 | 15-Sep-20 | 2 | |
1469679 | 3-Oct-20 | 2 | |
1469679 | 7-Oct-20 | 1 | |
1469679 | 3-Nov-20 | 2 | delete |
1502202 | 17-Aug-20 | 2 | |
1502202 | 29-Aug-20 | 2 | |
1502202 | 19-Sep-20 | 2 | |
1502202 | 28-Sep-20 | 2 | |
1502202 | 13-Nov-20 | 2 | |
1502202 | 21-Nov-20 | 2 | |
1502202 | 6-Dec-20 | 2 | |
1557673 | 20-Aug-20 | 2 | |
1560593 | 30-Aug-20 | 2 | |
1569074 | 11-Sep-20 | 2 | |
1571807 | 15-Sep-20 | 2 | |
1602114 | 13-Sep-20 | 2 | |
1743463 | 7-Jul-20 | 2 | |
1751405 | 7-Jun-20 | 2 | |
1751737 | 14-Jul-20 | 2 | |
1760386 | 27-Jul-20 | 2 | |
1760410 | 28-Jul-20 | 2 | |
2022188 | 8-Nov-20 | 2 | |
2076478 | 11-Nov-20 | 2 | |
2077550 | 6-Oct-20 | 3 | |
2078033 | 30-Nov-20 | 2 | |
2079325 | 5-Dec-20 | 9 | |
2081383 | 26-Nov-20 | 2 | |
2085676 | 24-Oct-20 | 2 | |
2086149 | 6-Dec-20 | 2 | |
2086958 | 4-Dec-20 | 2 | |
2088387 | 18-Sep-20 | 9 | |
2088387 | 17-Nov-20 | 1 | |
2088387 | 8-Dec-20 | 2 | delete |
2093450 | 8-Dec-20 | 2 | |
2094033 | 28-Nov-20 | 2 | |
2094452 | 21-Oct-20 | 2 | |
2097616 | 29-Oct-20 | 2 | |
2111066 | 20-Nov-20 | 2 | |
2112099 | 18-Sep-20 | 2 | |
2114632 | 6-Oct-20 | 2 | |
2118460 | 25-Sep-20 | 2 | |
2118460 | 29-Sep-20 | 1 | |
2118460 | 10-Nov-20 | 2 | delete |
2118460 | 20-Nov-20 | 2 | delete |
2118460 | 26-Nov-20 | 2 | delete |
2118460 | 4-Dec-20 | 2 | delete |
2118460 | 6-Dec-20 | 2 | delete |
2133328 | 30-Nov-20 | 2 | |
2133571 | 13-Oct-20 | 2 | |
2134005 | 8-Dec-20 | 2 | |
2134473 | 29-Nov-20 | 2 | |
2153388 | 11-Nov-20 | 2 | |
2166498 | 20-Nov-20 | 2 | |
2167076 | 5-Dec-20 | 2 | |
2168477 | 8-Dec-20 | 2 | |
2168481 | 8-Nov-20 | 2 | |
2168517 | 8-Nov-20 | 2 | |
2168548 | 20-Nov-20 | 2 | |
2168564 | 2-Nov-20 | 2 | |
2168574 | 15-Oct-20 | 2 | |
2168601 | 5-Dec-20 | 2 | |
2168638 | 4-Dec-20 | 2 | |
2168642 | 6-Oct-20 | 2 | |
2169406 | 8-Dec-20 | 2 | |
2172274 | 21-Oct-20 | 2 | |
2172274 | 20-Nov-20 | 1 | |
2172274 | 4-Dec-20 | 9 | delete |
2174225 | 6-Dec-20 | 2 | |
2176363 | 14-Nov-20 | 2 | |
2177143 | 5-Oct-20 | 2 | |
2177326 | 8-Dec-20 | 2 | |
2178406 | 8-Dec-20 | 2 | |
2178412 | 5-Oct-20 | 2 | |
2178416 | 13-Nov-20 | 2 |
There was a similar case.
https://communities.sas.com/t5/SAS-Programming/How-to-delete-observations-based-on-dates-in-a-by-gro...
data have;
infile datalines dlm="09"x;
input PATIENT_ID _SPECIMEN_COLL_DT:anydtdte. RESULT;
format _SPECIMEN_COLL_DT yymmdds10.;
datalines;
1007215 23-Sep-20 2
1007215 25-Sep-20 2
1007215 27-Sep-20 1
1007215 1-Nov-20 2
1007215 10-Nov-20 2
1007215 14-Nov-20 2
1007215 30-Nov-20 2
1007215 6-Dec-20 9
1342677 19-Oct-20 2
1342677 7-Nov-20 2
1345898 25-Aug-20 9
1345898 27-Sep-20 2
1345898 11-Oct-20 2
1345898 23-Oct-20 1
1345898 30-Oct-20 2
1345898 6-Nov-20 2
1345898 22-Nov-20 2
1345898 29-Nov-20 2
1345898 6-Dec-20 2
1408252 21-Sep-20 2
1439221 25-Oct-20 2
1439245 8-Oct-20 2
1442703 21-Oct-20 2
1451575 24-Aug-20 2
1451575 24-Nov-20 2
1462769 9-Sep-20 2
1469679 15-Sep-20 2
1469679 3-Oct-20 2
1469679 7-Oct-20 1
1469679 3-Nov-20 2
1502202 17-Aug-20 2
1502202 29-Aug-20 2
1502202 19-Sep-20 2
1502202 28-Sep-20 2
1502202 13-Nov-20 2
1502202 21-Nov-20 2
1502202 6-Dec-20 2
1557673 20-Aug-20 2
1560593 30-Aug-20 2
1569074 11-Sep-20 2
1571807 15-Sep-20 2
1602114 13-Sep-20 2
1743463 7-Jul-20 2
1751405 7-Jun-20 2
1751737 14-Jul-20 2
1760386 27-Jul-20 2
1760410 28-Jul-20 2
2022188 8-Nov-20 2
2076478 11-Nov-20 2
2077550 6-Oct-20 3
2078033 30-Nov-20 2
2079325 5-Dec-20 9
2081383 26-Nov-20 2
2085676 24-Oct-20 2
2086149 6-Dec-20 2
2086958 4-Dec-20 2
2088387 18-Sep-20 9
2088387 17-Nov-20 1
2088387 8-Dec-20 2
2093450 8-Dec-20 2
2094033 28-Nov-20 2
2094452 21-Oct-20 2
2097616 29-Oct-20 2
2111066 20-Nov-20 2
2112099 18-Sep-20 2
2114632 6-Oct-20 2
2118460 25-Sep-20 2
2118460 29-Sep-20 1
2118460 10-Nov-20 2
2118460 20-Nov-20 2
2118460 26-Nov-20 2
2118460 4-Dec-20 2
2118460 6-Dec-20 2
2133328 30-Nov-20 2
2133571 13-Oct-20 2
2134005 8-Dec-20 2
2134473 29-Nov-20 2
2153388 11-Nov-20 2
2166498 20-Nov-20 2
2167076 5-Dec-20 2
2168477 8-Dec-20 2
2168481 8-Nov-20 2
2168517 8-Nov-20 2
2168548 20-Nov-20 2
2168564 2-Nov-20 2
2168574 15-Oct-20 2
2168601 5-Dec-20 2
2168638 4-Dec-20 2
2168642 6-Oct-20 2
2169406 8-Dec-20 2
2172274 21-Oct-20 2
2172274 20-Nov-20 1
2172274 4-Dec-20 9
2174225 6-Dec-20 2
2176363 14-Nov-20 2
2177143 5-Oct-20 2
2177326 8-Dec-20 2
2178406 8-Dec-20 2
2178412 5-Oct-20 2
2178416 13-Nov-20 2
;
run;
data want;
*condition is set when result hits 1 or last patient;
do until(last.patient_id | result=1);
set have;
by patient_id;
*set _N_ as flag;
if first.patient_id then _N_=0;
*output observation until flag hits the condition;
if ^_N_ then output;
end;
run;
See if this gets you started;
data want; set have; by patient_id; retain deleteflag; if first.patient_id then call missing(deleteflag); if deleteflag=1 then delete; if result=1 then deleteflag=1;
drop deleteflag; run;
Assumes your data is sorted by Patient_id and the dates.
Retain will keep a value across iterations of a data step.
When you use a By statement then SAS creates automatic variables First. and Last. for each variable on the by statement that indicate whether the current record is the first or last value for that variable within a by group. So you reset values for the first record of the Patient_id with "If first.patient_id" (do pay attention to the presence of the dot between the key word and the variable name).
So we set a flag the first time a value of 1 is encountered. The next record has the flag set so the Delete removes the record.
Assuming that data are sorted by _SPECIMENT_COOL_DT for each PATIENT_ID, and that no PATIENT_ID has more than one result=1 record, this self-match-merge with a rename will do what you want:
data want (drop=_cutoff_date);
merge have (where=(result=1) rename=(_SPECIMEN_COLL_DT=_cutoff_date))
have;
by patient_id;
if _SPECIMEN_COLL_DT<=_cutoff_date;
run;
And if a patient_id has no result=1 records, then the entire patient's history will be kept.
There was a similar case.
https://communities.sas.com/t5/SAS-Programming/How-to-delete-observations-based-on-dates-in-a-by-gro...
data have;
infile datalines dlm="09"x;
input PATIENT_ID _SPECIMEN_COLL_DT:anydtdte. RESULT;
format _SPECIMEN_COLL_DT yymmdds10.;
datalines;
1007215 23-Sep-20 2
1007215 25-Sep-20 2
1007215 27-Sep-20 1
1007215 1-Nov-20 2
1007215 10-Nov-20 2
1007215 14-Nov-20 2
1007215 30-Nov-20 2
1007215 6-Dec-20 9
1342677 19-Oct-20 2
1342677 7-Nov-20 2
1345898 25-Aug-20 9
1345898 27-Sep-20 2
1345898 11-Oct-20 2
1345898 23-Oct-20 1
1345898 30-Oct-20 2
1345898 6-Nov-20 2
1345898 22-Nov-20 2
1345898 29-Nov-20 2
1345898 6-Dec-20 2
1408252 21-Sep-20 2
1439221 25-Oct-20 2
1439245 8-Oct-20 2
1442703 21-Oct-20 2
1451575 24-Aug-20 2
1451575 24-Nov-20 2
1462769 9-Sep-20 2
1469679 15-Sep-20 2
1469679 3-Oct-20 2
1469679 7-Oct-20 1
1469679 3-Nov-20 2
1502202 17-Aug-20 2
1502202 29-Aug-20 2
1502202 19-Sep-20 2
1502202 28-Sep-20 2
1502202 13-Nov-20 2
1502202 21-Nov-20 2
1502202 6-Dec-20 2
1557673 20-Aug-20 2
1560593 30-Aug-20 2
1569074 11-Sep-20 2
1571807 15-Sep-20 2
1602114 13-Sep-20 2
1743463 7-Jul-20 2
1751405 7-Jun-20 2
1751737 14-Jul-20 2
1760386 27-Jul-20 2
1760410 28-Jul-20 2
2022188 8-Nov-20 2
2076478 11-Nov-20 2
2077550 6-Oct-20 3
2078033 30-Nov-20 2
2079325 5-Dec-20 9
2081383 26-Nov-20 2
2085676 24-Oct-20 2
2086149 6-Dec-20 2
2086958 4-Dec-20 2
2088387 18-Sep-20 9
2088387 17-Nov-20 1
2088387 8-Dec-20 2
2093450 8-Dec-20 2
2094033 28-Nov-20 2
2094452 21-Oct-20 2
2097616 29-Oct-20 2
2111066 20-Nov-20 2
2112099 18-Sep-20 2
2114632 6-Oct-20 2
2118460 25-Sep-20 2
2118460 29-Sep-20 1
2118460 10-Nov-20 2
2118460 20-Nov-20 2
2118460 26-Nov-20 2
2118460 4-Dec-20 2
2118460 6-Dec-20 2
2133328 30-Nov-20 2
2133571 13-Oct-20 2
2134005 8-Dec-20 2
2134473 29-Nov-20 2
2153388 11-Nov-20 2
2166498 20-Nov-20 2
2167076 5-Dec-20 2
2168477 8-Dec-20 2
2168481 8-Nov-20 2
2168517 8-Nov-20 2
2168548 20-Nov-20 2
2168564 2-Nov-20 2
2168574 15-Oct-20 2
2168601 5-Dec-20 2
2168638 4-Dec-20 2
2168642 6-Oct-20 2
2169406 8-Dec-20 2
2172274 21-Oct-20 2
2172274 20-Nov-20 1
2172274 4-Dec-20 9
2174225 6-Dec-20 2
2176363 14-Nov-20 2
2177143 5-Oct-20 2
2177326 8-Dec-20 2
2178406 8-Dec-20 2
2178412 5-Oct-20 2
2178416 13-Nov-20 2
;
run;
data want;
*condition is set when result hits 1 or last patient;
do until(last.patient_id | result=1);
set have;
by patient_id;
*set _N_ as flag;
if first.patient_id then _N_=0;
*output observation until flag hits the condition;
if ^_N_ then output;
end;
run;
data have;
input PATIENT_ID _SPECIMEN_COLL_DT :date9. RESULT;
format _SPECIMEN_COLL_DT date9.;
datalines;
1007215 23-Sep-20 2
1007215 25-Sep-20 2
1007215 27-Sep-20 1
1007215 1-Nov-20 2
1007215 10-Nov-20 2
1007215 14-Nov-20 2
1007215 30-Nov-20 2
1007215 6-Dec-20 9
1342677 19-Oct-20 2
1342677 7-Nov-20 2
1345898 25-Aug-20 9
1345898 27-Sep-20 2
1345898 11-Oct-20 2
1345898 23-Oct-20 1
1345898 30-Oct-20 2
1345898 6-Nov-20 2
1345898 22-Nov-20 2
1345898 29-Nov-20 2
1345898 6-Dec-20 2
1408252 21-Sep-20 2
1439221 25-Oct-20 2
1439245 8-Oct-20 2
1442703 21-Oct-20 2
1451575 24-Aug-20 2
1451575 24-Nov-20 2
1462769 9-Sep-20 2
1469679 15-Sep-20 2
1469679 3-Oct-20 2
1469679 7-Oct-20 1
1469679 3-Nov-20 2
1502202 17-Aug-20 2
1502202 29-Aug-20 2
1502202 19-Sep-20 2
1502202 28-Sep-20 2
1502202 13-Nov-20 2
1502202 21-Nov-20 2
1502202 6-Dec-20 2
1557673 20-Aug-20 2
1560593 30-Aug-20 2
1569074 11-Sep-20 2
1571807 15-Sep-20 2
1602114 13-Sep-20 2
1743463 7-Jul-20 2
1751405 7-Jun-20 2
1751737 14-Jul-20 2
1760386 27-Jul-20 2
1760410 28-Jul-20 2
2022188 8-Nov-20 2
2076478 11-Nov-20 2
2077550 6-Oct-20 3
2078033 30-Nov-20 2
2079325 5-Dec-20 9
2081383 26-Nov-20 2
2085676 24-Oct-20 2
2086149 6-Dec-20 2
2086958 4-Dec-20 2
2088387 18-Sep-20 9
2088387 17-Nov-20 1
2088387 8-Dec-20 2
2093450 8-Dec-20 2
2094033 28-Nov-20 2
2094452 21-Oct-20 2
2097616 29-Oct-20 2
2111066 20-Nov-20 2
2112099 18-Sep-20 2
2114632 6-Oct-20 2
2118460 25-Sep-20 2
2118460 29-Sep-20 1
2118460 10-Nov-20 2
2118460 20-Nov-20 2
2118460 26-Nov-20 2
2118460 4-Dec-20 2
2118460 6-Dec-20 2
2133328 30-Nov-20 2
2133571 13-Oct-20 2
2134005 8-Dec-20 2
2134473 29-Nov-20 2
2153388 11-Nov-20 2
2166498 20-Nov-20 2
2167076 5-Dec-20 2
2168477 8-Dec-20 2
2168481 8-Nov-20 2
2168517 8-Nov-20 2
2168548 20-Nov-20 2
2168564 2-Nov-20 2
2168574 15-Oct-20 2
2168601 5-Dec-20 2
2168638 4-Dec-20 2
2168642 6-Oct-20 2
2169406 8-Dec-20 2
2172274 21-Oct-20 2
2172274 20-Nov-20 1
2172274 4-Dec-20 9
2174225 6-Dec-20 2
2176363 14-Nov-20 2
2177143 5-Oct-20 2
2177326 8-Dec-20 2
2178406 8-Dec-20 2
2178412 5-Oct-20 2
2178416 13-Nov-20 2
;
run;
proc sql;
create table want as
select *
from have
group by PATIENT_ID
having _SPECIMEN_COLL_DT<=max((result=1)*_SPECIMEN_COLL_DT)
order by PATIENT_ID, _SPECIMEN_COLL_DT;
quit;
Thanks so much, everyone, But so far, no go.
The mkeintz novinosrin solutions deleted appropriate records for any patient who had had a positive test, and also all records for patients who had not had any positive test. Those latter records need to be maintained in the dataset.
The hhinohar solution yielded a dataset with 33 records with missing values for every variable on every record. There was a problem reading in PATIENT_ID. I'll look more into it.
The solution, using the data provided, should delete the 18 records marked with "delete," yielding a dataset with 82 records. It should maintain all the records for patient ID that never had a positive result, as well as the records with the positive results, and the records for patients with a positive result for which the specimen was collected before the date of the positive result.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.