Please see below is an example of the messy data I have. A member has records for mutiple hospitalizations; for seemingly one hospitalization, there were multiple rows of records with slightly different admission dates and discharge dates.
I hope to clean the data and get one record for one hospitalization. When there are mutiple rows for seemingly one hospitalization because of the overlaping time, create one record with earliest admission date and latest discharge date.
Though not scientific, consider using a DATA step and a DO / END loop to generate one observation for each date within the range, then sort the observations in sequence and use the LAG function to compare adjacent observations, and if there is a gap of more than some determined length of days (using the INTCK function), then increment a counter. Lastly, use PROC SUMMARY to generate a MIN/MAX variable for each hospitalization.