I have a dataset with 65,000+ separate hospitalization records. I need to delete any entries that indicate a repeated hospital visit by the same person, so the resulting file only has one entry per person (will be used to match to another data file later on in the project).
Each event in the dataset has a unique hospital ID, and I do not have access to SSN. Therefore, I will depend on a combination of first/last name and DOB to identify repeated admissions. Is there a straightforward way to do this? Thanks!
PROC SORT with UNIQUEREC option.
However, I would be very cautious with this, removing repeats is a strange request for health care data analysis. Usually that record is summarized in some manner, ie count the number of admissions, number of 30 day readmission and other metrics, but straight delete seems dangerous. This comes from almost a decade of working with health data.
For this task I just need a list of anyone who has had at least one hospitalization in my original discharge dataset, for linkage purposes. The full set of hospitalizations will be used for any analysis.
If you only need the ids, then use something like
proc sql;
create table id_list as
select distinct first_name, last_name, birth_date, sex
from table1;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.