BookmarkSubscribeRSS Feed
newgrad
Calcite | Level 5

I have a dataset with 65,000+ separate hospitalization records. I need to delete any entries that indicate a repeated hospital visit by the same person, so the resulting file only has one entry per person (will be used to match to another data file later on in the project).

 

Each event in the dataset has a unique hospital ID, and I do not have access to SSN. Therefore, I will depend on a combination of first/last name and DOB to identify repeated admissions. Is there a straightforward way to do this? Thanks!

3 REPLIES 3
Reeza
Super User

PROC SORT with UNIQUEREC option. 

 

However, I would be very cautious with this, removing repeats is a strange request for health care data analysis. Usually that record is summarized in some manner, ie count the number of admissions, number of 30 day readmission and other metrics, but straight delete seems dangerous. This comes from almost a decade of working with health data. 

 

 

newgrad
Calcite | Level 5

For this task I  just need a list of anyone who has had at least one hospitalization in my original discharge dataset, for linkage purposes. The full set of hospitalizations will be used for any analysis.

Reeza
Super User

If you only need the ids, then use something like

 

proc sql;
create table id_list as
select distinct first_name, last_name, birth_date, sex
from table1;
quit;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 760 views
  • 0 likes
  • 2 in conversation