Solved: Re: Calculate time difference with duplicates and exclusions

MHines · Posted 05-17-2021 01:21 PM

Hello,

I am working with a dataset that tracks entry and exit dates per individual, with each individual having a unique identifying number. In most instances, the individual only has a single record, however several individuals have multiple entry and exit dates. I am trying to calculate the number of days each individual is in the program by adding the difference between each release and entry date. The data looks like this:

casenum	entry_date	release_date	counter
11111	1/2/2019	1/3/2019	1
11111	1/17/2019	1/21/2019	2
11112	1/1/2019	1/6/2019	1
11113	1/2/2019	1/5/2019	1
11114	1/1/2019	1/3/2019	1
11114	1/9/2019	1/15/2019	2
11114	1/17/2019	1/18/2019	3
11115	1/3/2019	1/5/2019	1
11116	1/2/2019	1/17/2019	1
11117	1/2/2019	1/6/2019	1
11117	1/18/2019	1/11/2019	2
11117	1/15/2019	1/27/2019	3
11117	2/5/2019	2/14/2019	4
11118	1/2/2019	2/1/2019	1
11119	1/1/2019	1/29/2019	1
11120	1/1/2019	1/5/2019	1
11120	1/1/2019	1/17/2019	2
11120	1/1/2019	1/22/2019	3

I was given the variables casenum, entry_date, and release_date. The counter variable was something I created. As I mentioned earlier, I need to calculate the number of days between each entry and release date per person. However, the snag I have run into is that I need to exclude cases like 11120, where a single entry date has multiple release dates; these need to be left as missing when I calculate the time duration. I would prefer to write the program such that the program drops ALL observations of the case number 11120 based on the existence of duplicate entry dates, rather than by using the precise case number (I am trying to limit my hard coding to avoid having to update it every time we receive new data). Is there an efficient way to go about 1) dropping the observations that need to be dropped and 2) adding the total time for the rest? Any advice is appreciated!

AMSAS · Posted 05-17-2021 01:39 PM

As long as the key to removing the error records is for a single case number you can not have repeats of the entry_date then the following should work. This is using FIRST. and LAST. DATA Step Variables

data got ;
	infile cards ;
	input casenum $ entry_date mmddyy8. release_date mmddyy8. ;
cards ;
11118 01/02/19 02/01/19
11119 01/01/19 01/29/19
11120 01/01/19 01/05/19
11120 01/01/19 01/17/19
11120 01/01/19 01/22/19
;

proc sort data=got out=srtdGot ;
	by casenum entry_date ;
run ;

data want rejected ;
	set srtdGot ;
	by casenum entry_date ;
	if first.entry_date and last.entry_date then
		output want ;
	else
		output rejected ;
run ;

View solution in original post

AMSAS · Posted 05-17-2021 01:39 PM

As long as the key to removing the error records is for a single case number you can not have repeats of the entry_date then the following should work. This is using FIRST. and LAST. DATA Step Variables

data got ;
	infile cards ;
	input casenum $ entry_date mmddyy8. release_date mmddyy8. ;
cards ;
11118 01/02/19 02/01/19
11119 01/01/19 01/29/19
11120 01/01/19 01/05/19
11120 01/01/19 01/17/19
11120 01/01/19 01/22/19
;

proc sort data=got out=srtdGot ;
	by casenum entry_date ;
run ;

data want rejected ;
	set srtdGot ;
	by casenum entry_date ;
	if first.entry_date and last.entry_date then
		output want ;
	else
		output rejected ;
run ;

MHines · Posted 05-17-2021 01:58 PM

That worked, thank you!

Calculate time difference with duplicates and exclusions

Re: Calculate time difference with duplicates and exclusions

Re: Calculate time difference with duplicates and exclusions

Re: Calculate time difference with duplicates and exclusions

Catch up on SAS Innovate 2026