Remove Repeated Hospitalization Records

Reply
New Contributor
Posts: 4

Remove Repeated Hospitalization Records

I have a dataset with 65,000+ separate hospitalization records. I need to delete any entries that indicate a repeated hospital visit by the same person, so the resulting file only has one entry per person (will be used to match to another data file later on in the project).

 

Each event in the dataset has a unique hospital ID, and I do not have access to SSN. Therefore, I will depend on a combination of first/last name and DOB to identify repeated admissions. Is there a straightforward way to do this? Thanks!

Super User
Posts: 23,776

Re: Remove Repeated Hospitalization Records

PROC SORT with UNIQUEREC option. 

 

However, I would be very cautious with this, removing repeats is a strange request for health care data analysis. Usually that record is summarized in some manner, ie count the number of admissions, number of 30 day readmission and other metrics, but straight delete seems dangerous. This comes from almost a decade of working with health data. 

 

 

New Contributor
Posts: 4

Re: Remove Repeated Hospitalization Records

For this task I  just need a list of anyone who has had at least one hospitalization in my original discharge dataset, for linkage purposes. The full set of hospitalizations will be used for any analysis.

Super User
Posts: 23,776

Re: Remove Repeated Hospitalization Records

If you only need the ids, then use something like

 

proc sql;
create table id_list as
select distinct first_name, last_name, birth_date, sex
from table1;
quit;
Ask a Question
Discussion stats
  • 3 replies
  • 80 views
  • 0 likes
  • 2 in conversation