BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
loishaggard
Fluorite | Level 6

I have test result data for patients. Once a patient tests positive (result=1) any future tests for that patient should be deleted from the file.

 

So I know I need to identify the record with result=1 and grab that date, but I'm at a loss for how to apply that date within patient ID to delete records with a higher date value.

 

Thanks in advance.

 

PATIENT_ID_SPECIMEN_COLL_DTRESULT 
100721523-Sep-202 
100721525-Sep-202 
100721527-Sep-201 
10072151-Nov-202delete
100721510-Nov-202delete
100721514-Nov-202delete
100721530-Nov-202delete
10072156-Dec-209delete
134267719-Oct-202 
13426777-Nov-202 
134589825-Aug-209 
134589827-Sep-202 
134589811-Oct-202 
134589823-Oct-201 
134589830-Oct-202delete
13458986-Nov-202delete
134589822-Nov-202delete
134589829-Nov-202delete
13458986-Dec-202delete
140825221-Sep-202 
143922125-Oct-202 
14392458-Oct-202 
144270321-Oct-202 
145157524-Aug-202 
145157524-Nov-202 
14627699-Sep-202 
146967915-Sep-202 
14696793-Oct-202 
14696797-Oct-201 
14696793-Nov-202delete
150220217-Aug-202 
150220229-Aug-202 
150220219-Sep-202 
150220228-Sep-202 
150220213-Nov-202 
150220221-Nov-202 
15022026-Dec-202 
155767320-Aug-202 
156059330-Aug-202 
156907411-Sep-202 
157180715-Sep-202 
160211413-Sep-202 
17434637-Jul-202 
17514057-Jun-202 
175173714-Jul-202 
176038627-Jul-202 
176041028-Jul-202 
20221888-Nov-202 
207647811-Nov-202 
20775506-Oct-203 
207803330-Nov-202 
20793255-Dec-209 
208138326-Nov-202 
208567624-Oct-202 
20861496-Dec-202 
20869584-Dec-202 
208838718-Sep-209 
208838717-Nov-201 
20883878-Dec-202delete
20934508-Dec-202 
209403328-Nov-202 
209445221-Oct-202 
209761629-Oct-202 
211106620-Nov-202 
211209918-Sep-202 
21146326-Oct-202 
211846025-Sep-202 
211846029-Sep-201 
211846010-Nov-202delete
211846020-Nov-202delete
211846026-Nov-202delete
21184604-Dec-202delete
21184606-Dec-202delete
213332830-Nov-202 
213357113-Oct-202 
21340058-Dec-202 
213447329-Nov-202 
215338811-Nov-202 
216649820-Nov-202 
21670765-Dec-202 
21684778-Dec-202 
21684818-Nov-202 
21685178-Nov-202 
216854820-Nov-202 
21685642-Nov-202 
216857415-Oct-202 
21686015-Dec-202 
21686384-Dec-202 
21686426-Oct-202 
21694068-Dec-202 
217227421-Oct-202 
217227420-Nov-201 
21722744-Dec-209delete
21742256-Dec-202 
217636314-Nov-202 
21771435-Oct-202 
21773268-Dec-202 
21784068-Dec-202 
21784125-Oct-202 
217841613-Nov-202 
1 ACCEPTED SOLUTION

Accepted Solutions
hhinohar
Quartz | Level 8

There was a similar case.
https://communities.sas.com/t5/SAS-Programming/How-to-delete-observations-based-on-dates-in-a-by-gro...

 

data have;
	infile datalines dlm="09"x;
	input PATIENT_ID _SPECIMEN_COLL_DT:anydtdte. RESULT;
	format _SPECIMEN_COLL_DT yymmdds10.;
	datalines;
1007215	23-Sep-20	2
1007215	25-Sep-20	2
1007215	27-Sep-20	1
1007215	1-Nov-20	2
1007215	10-Nov-20	2
1007215	14-Nov-20	2
1007215	30-Nov-20	2
1007215	6-Dec-20	9
1342677	19-Oct-20	2
1342677	7-Nov-20	2
1345898	25-Aug-20	9
1345898	27-Sep-20	2
1345898	11-Oct-20	2
1345898	23-Oct-20	1
1345898	30-Oct-20	2
1345898	6-Nov-20	2
1345898	22-Nov-20	2
1345898	29-Nov-20	2
1345898	6-Dec-20	2
1408252	21-Sep-20	2
1439221	25-Oct-20	2
1439245	8-Oct-20	2
1442703	21-Oct-20	2
1451575	24-Aug-20	2
1451575	24-Nov-20	2
1462769	9-Sep-20	2
1469679	15-Sep-20	2
1469679	3-Oct-20	2
1469679	7-Oct-20	1
1469679	3-Nov-20	2
1502202	17-Aug-20	2
1502202	29-Aug-20	2
1502202	19-Sep-20	2
1502202	28-Sep-20	2
1502202	13-Nov-20	2
1502202	21-Nov-20	2
1502202	6-Dec-20	2
1557673	20-Aug-20	2
1560593	30-Aug-20	2
1569074	11-Sep-20	2
1571807	15-Sep-20	2
1602114	13-Sep-20	2
1743463	7-Jul-20	2
1751405	7-Jun-20	2
1751737	14-Jul-20	2
1760386	27-Jul-20	2
1760410	28-Jul-20	2
2022188	8-Nov-20	2
2076478	11-Nov-20	2
2077550	6-Oct-20	3
2078033	30-Nov-20	2
2079325	5-Dec-20	9
2081383	26-Nov-20	2
2085676	24-Oct-20	2
2086149	6-Dec-20	2
2086958	4-Dec-20	2
2088387	18-Sep-20	9
2088387	17-Nov-20	1
2088387	8-Dec-20	2
2093450	8-Dec-20	2
2094033	28-Nov-20	2
2094452	21-Oct-20	2
2097616	29-Oct-20	2
2111066	20-Nov-20	2
2112099	18-Sep-20	2
2114632	6-Oct-20	2
2118460	25-Sep-20	2
2118460	29-Sep-20	1
2118460	10-Nov-20	2
2118460	20-Nov-20	2
2118460	26-Nov-20	2
2118460	4-Dec-20	2
2118460	6-Dec-20	2
2133328	30-Nov-20	2
2133571	13-Oct-20	2
2134005	8-Dec-20	2
2134473	29-Nov-20	2
2153388	11-Nov-20	2
2166498	20-Nov-20	2
2167076	5-Dec-20	2
2168477	8-Dec-20	2
2168481	8-Nov-20	2
2168517	8-Nov-20	2
2168548	20-Nov-20	2
2168564	2-Nov-20	2
2168574	15-Oct-20	2
2168601	5-Dec-20	2
2168638	4-Dec-20	2
2168642	6-Oct-20	2
2169406	8-Dec-20	2
2172274	21-Oct-20	2
2172274	20-Nov-20	1
2172274	4-Dec-20	9
2174225	6-Dec-20	2
2176363	14-Nov-20	2
2177143	5-Oct-20	2
2177326	8-Dec-20	2
2178406	8-Dec-20	2
2178412	5-Oct-20	2
2178416	13-Nov-20	2
;
run;

data want;
	*condition is set when result hits 1 or last patient;
	do until(last.patient_id | result=1);
		set have;
		by patient_id;
		*set _N_ as flag;
		if first.patient_id then _N_=0;
		*output observation until flag hits the condition;
		if ^_N_ then output;
	end;
run;

View solution in original post

6 REPLIES 6
ballardw
Super User

See if this gets you started;

 

data want;
    set have;
    by patient_id;
   retain deleteflag;
   if first.patient_id then call missing(deleteflag);
   if deleteflag=1 then delete;
   if result=1 then deleteflag=1;
drop deleteflag; run;

Assumes your data is sorted by Patient_id and the dates.

Retain will keep a value across iterations of a data step.

When you use a By statement then SAS creates automatic variables First. and Last. for each variable on the by statement that indicate whether the current record is the first or last value for that variable within a by group. So you reset values for the first record of the Patient_id with "If first.patient_id" (do pay attention to the presence of the dot between the key word and the variable name).

So we set a flag the first time a value of 1 is encountered. The next record has the flag set so the Delete removes the record.

loishaggard
Fluorite | Level 6
Thanks. I'll give it a try!
mkeintz
PROC Star

Assuming that data are sorted by _SPECIMENT_COOL_DT for each PATIENT_ID, and that no PATIENT_ID has more than one result=1 record, this self-match-merge with a rename will do what you want: 

 

data want (drop=_cutoff_date);
  merge have (where=(result=1) rename=(_SPECIMEN_COLL_DT=_cutoff_date))
        have;
  by patient_id;
  if _SPECIMEN_COLL_DT<=_cutoff_date;
run;

And if a patient_id has no result=1 records, then the entire patient's history will be kept.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
hhinohar
Quartz | Level 8

There was a similar case.
https://communities.sas.com/t5/SAS-Programming/How-to-delete-observations-based-on-dates-in-a-by-gro...

 

data have;
	infile datalines dlm="09"x;
	input PATIENT_ID _SPECIMEN_COLL_DT:anydtdte. RESULT;
	format _SPECIMEN_COLL_DT yymmdds10.;
	datalines;
1007215	23-Sep-20	2
1007215	25-Sep-20	2
1007215	27-Sep-20	1
1007215	1-Nov-20	2
1007215	10-Nov-20	2
1007215	14-Nov-20	2
1007215	30-Nov-20	2
1007215	6-Dec-20	9
1342677	19-Oct-20	2
1342677	7-Nov-20	2
1345898	25-Aug-20	9
1345898	27-Sep-20	2
1345898	11-Oct-20	2
1345898	23-Oct-20	1
1345898	30-Oct-20	2
1345898	6-Nov-20	2
1345898	22-Nov-20	2
1345898	29-Nov-20	2
1345898	6-Dec-20	2
1408252	21-Sep-20	2
1439221	25-Oct-20	2
1439245	8-Oct-20	2
1442703	21-Oct-20	2
1451575	24-Aug-20	2
1451575	24-Nov-20	2
1462769	9-Sep-20	2
1469679	15-Sep-20	2
1469679	3-Oct-20	2
1469679	7-Oct-20	1
1469679	3-Nov-20	2
1502202	17-Aug-20	2
1502202	29-Aug-20	2
1502202	19-Sep-20	2
1502202	28-Sep-20	2
1502202	13-Nov-20	2
1502202	21-Nov-20	2
1502202	6-Dec-20	2
1557673	20-Aug-20	2
1560593	30-Aug-20	2
1569074	11-Sep-20	2
1571807	15-Sep-20	2
1602114	13-Sep-20	2
1743463	7-Jul-20	2
1751405	7-Jun-20	2
1751737	14-Jul-20	2
1760386	27-Jul-20	2
1760410	28-Jul-20	2
2022188	8-Nov-20	2
2076478	11-Nov-20	2
2077550	6-Oct-20	3
2078033	30-Nov-20	2
2079325	5-Dec-20	9
2081383	26-Nov-20	2
2085676	24-Oct-20	2
2086149	6-Dec-20	2
2086958	4-Dec-20	2
2088387	18-Sep-20	9
2088387	17-Nov-20	1
2088387	8-Dec-20	2
2093450	8-Dec-20	2
2094033	28-Nov-20	2
2094452	21-Oct-20	2
2097616	29-Oct-20	2
2111066	20-Nov-20	2
2112099	18-Sep-20	2
2114632	6-Oct-20	2
2118460	25-Sep-20	2
2118460	29-Sep-20	1
2118460	10-Nov-20	2
2118460	20-Nov-20	2
2118460	26-Nov-20	2
2118460	4-Dec-20	2
2118460	6-Dec-20	2
2133328	30-Nov-20	2
2133571	13-Oct-20	2
2134005	8-Dec-20	2
2134473	29-Nov-20	2
2153388	11-Nov-20	2
2166498	20-Nov-20	2
2167076	5-Dec-20	2
2168477	8-Dec-20	2
2168481	8-Nov-20	2
2168517	8-Nov-20	2
2168548	20-Nov-20	2
2168564	2-Nov-20	2
2168574	15-Oct-20	2
2168601	5-Dec-20	2
2168638	4-Dec-20	2
2168642	6-Oct-20	2
2169406	8-Dec-20	2
2172274	21-Oct-20	2
2172274	20-Nov-20	1
2172274	4-Dec-20	9
2174225	6-Dec-20	2
2176363	14-Nov-20	2
2177143	5-Oct-20	2
2177326	8-Dec-20	2
2178406	8-Dec-20	2
2178412	5-Oct-20	2
2178416	13-Nov-20	2
;
run;

data want;
	*condition is set when result hits 1 or last patient;
	do until(last.patient_id | result=1);
		set have;
		by patient_id;
		*set _N_ as flag;
		if first.patient_id then _N_=0;
		*output observation until flag hits the condition;
		if ^_N_ then output;
	end;
run;
novinosrin
Tourmaline | Level 20

data have;
	input PATIENT_ID _SPECIMEN_COLL_DT :date9. RESULT;
	format _SPECIMEN_COLL_DT date9.;
	datalines;
1007215	23-Sep-20	2
1007215	25-Sep-20	2
1007215	27-Sep-20	1
1007215	1-Nov-20	2
1007215	10-Nov-20	2
1007215	14-Nov-20	2
1007215	30-Nov-20	2
1007215	6-Dec-20	9
1342677	19-Oct-20	2
1342677	7-Nov-20	2
1345898	25-Aug-20	9
1345898	27-Sep-20	2
1345898	11-Oct-20	2
1345898	23-Oct-20	1
1345898	30-Oct-20	2
1345898	6-Nov-20	2
1345898	22-Nov-20	2
1345898	29-Nov-20	2
1345898	6-Dec-20	2
1408252	21-Sep-20	2
1439221	25-Oct-20	2
1439245	8-Oct-20	2
1442703	21-Oct-20	2
1451575	24-Aug-20	2
1451575	24-Nov-20	2
1462769	9-Sep-20	2
1469679	15-Sep-20	2
1469679	3-Oct-20	2
1469679	7-Oct-20	1
1469679	3-Nov-20	2
1502202	17-Aug-20	2
1502202	29-Aug-20	2
1502202	19-Sep-20	2
1502202	28-Sep-20	2
1502202	13-Nov-20	2
1502202	21-Nov-20	2
1502202	6-Dec-20	2
1557673	20-Aug-20	2
1560593	30-Aug-20	2
1569074	11-Sep-20	2
1571807	15-Sep-20	2
1602114	13-Sep-20	2
1743463	7-Jul-20	2
1751405	7-Jun-20	2
1751737	14-Jul-20	2
1760386	27-Jul-20	2
1760410	28-Jul-20	2
2022188	8-Nov-20	2
2076478	11-Nov-20	2
2077550	6-Oct-20	3
2078033	30-Nov-20	2
2079325	5-Dec-20	9
2081383	26-Nov-20	2
2085676	24-Oct-20	2
2086149	6-Dec-20	2
2086958	4-Dec-20	2
2088387	18-Sep-20	9
2088387	17-Nov-20	1
2088387	8-Dec-20	2
2093450	8-Dec-20	2
2094033	28-Nov-20	2
2094452	21-Oct-20	2
2097616	29-Oct-20	2
2111066	20-Nov-20	2
2112099	18-Sep-20	2
2114632	6-Oct-20	2
2118460	25-Sep-20	2
2118460	29-Sep-20	1
2118460	10-Nov-20	2
2118460	20-Nov-20	2
2118460	26-Nov-20	2
2118460	4-Dec-20	2
2118460	6-Dec-20	2
2133328	30-Nov-20	2
2133571	13-Oct-20	2
2134005	8-Dec-20	2
2134473	29-Nov-20	2
2153388	11-Nov-20	2
2166498	20-Nov-20	2
2167076	5-Dec-20	2
2168477	8-Dec-20	2
2168481	8-Nov-20	2
2168517	8-Nov-20	2
2168548	20-Nov-20	2
2168564	2-Nov-20	2
2168574	15-Oct-20	2
2168601	5-Dec-20	2
2168638	4-Dec-20	2
2168642	6-Oct-20	2
2169406	8-Dec-20	2
2172274	21-Oct-20	2
2172274	20-Nov-20	1
2172274	4-Dec-20	9
2174225	6-Dec-20	2
2176363	14-Nov-20	2
2177143	5-Oct-20	2
2177326	8-Dec-20	2
2178406	8-Dec-20	2
2178412	5-Oct-20	2
2178416	13-Nov-20	2
;
run;

proc sql;
 create table want as
 select * 
 from have
 group by PATIENT_ID
 having _SPECIMEN_COLL_DT<=max((result=1)*_SPECIMEN_COLL_DT)
 order by PATIENT_ID, _SPECIMEN_COLL_DT;
quit;
loishaggard
Fluorite | Level 6

Thanks so much, everyone, But so far, no go.

 

The mkeintz novinosrin solutions deleted appropriate records for any patient who had had a positive test, and also all records for patients who had not had any positive test. Those latter records need to be maintained in the dataset.

 

The hhinohar solution yielded a dataset with 33 records with missing values for every variable on every record. There was a problem reading in PATIENT_ID. I'll look more into it.

 

The solution, using the data provided, should delete the 18 records marked with "delete," yielding a dataset with 82 records. It should maintain all the records for patient ID that never had a positive result, as well as the records with the positive results, and the records for patients with a positive result for which the specimen was collected before the date of the positive result.

 
Thanks again for you efforts!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1299 views
  • 1 like
  • 5 in conversation