BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MHines
Obsidian | Level 7

Hello,

I am working with a dataset that tracks entry and exit dates per individual, with each individual having a unique identifying number. In most instances, the individual only has a single record, however several individuals have multiple entry and exit dates. I am trying to calculate the number of days each individual is in the program by adding the difference between each release and entry date. The data looks like this:

casenumentry_daterelease_datecounter
111111/2/20191/3/20191
111111/17/20191/21/20192
111121/1/20191/6/20191
111131/2/20191/5/20191
111141/1/20191/3/20191
111141/9/20191/15/20192
111141/17/20191/18/20193
111151/3/20191/5/20191
111161/2/20191/17/20191
111171/2/20191/6/20191
111171/18/20191/11/20192
111171/15/20191/27/20193
111172/5/20192/14/20194
111181/2/20192/1/20191
111191/1/20191/29/20191
111201/1/20191/5/20191
111201/1/20191/17/20192
111201/1/20191/22/20193

I was given the variables casenum, entry_date, and release_date. The counter variable was something I created. As I mentioned earlier, I need to calculate the number of days between each entry and release date per person. However, the snag I have run into is that I need to exclude cases like 11120, where a single entry date has multiple release dates; these need to be left as missing when I calculate the time duration. I would prefer to write the program such that the program drops ALL observations of the case number 11120 based on the existence of duplicate entry dates, rather than by using the precise case number (I am trying to limit my hard coding to avoid having to update it every time we receive new data). Is there an efficient way to go about 1) dropping the observations that need to be dropped and 2) adding the total time for the rest? Any advice is appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
AMSAS
SAS Super FREQ

As long as the key to removing the error records is for a single case number you can not have repeats of the entry_date then the following should work. This is using FIRST. and LAST. DATA Step Variables

 

data got ;
	infile cards ;
	input casenum $ entry_date mmddyy8. release_date mmddyy8. ;
cards ;
11118 01/02/19 02/01/19
11119 01/01/19 01/29/19
11120 01/01/19 01/05/19
11120 01/01/19 01/17/19
11120 01/01/19 01/22/19
;

proc sort data=got out=srtdGot ;
	by casenum entry_date ;
run ;

data want rejected ;
	set srtdGot ;
	by casenum entry_date ;
	if first.entry_date and last.entry_date then
		output want ;
	else
		output rejected ;
run ;

View solution in original post

2 REPLIES 2
AMSAS
SAS Super FREQ

As long as the key to removing the error records is for a single case number you can not have repeats of the entry_date then the following should work. This is using FIRST. and LAST. DATA Step Variables

 

data got ;
	infile cards ;
	input casenum $ entry_date mmddyy8. release_date mmddyy8. ;
cards ;
11118 01/02/19 02/01/19
11119 01/01/19 01/29/19
11120 01/01/19 01/05/19
11120 01/01/19 01/17/19
11120 01/01/19 01/22/19
;

proc sort data=got out=srtdGot ;
	by casenum entry_date ;
run ;

data want rejected ;
	set srtdGot ;
	by casenum entry_date ;
	if first.entry_date and last.entry_date then
		output want ;
	else
		output rejected ;
run ;
MHines
Obsidian | Level 7
That worked, thank you!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 504 views
  • 1 like
  • 2 in conversation