BookmarkSubscribeRSS Feed
Celina1
Calcite | Level 5

I am wondering what the best approach is for creating a data set that has only one row for each patient, but maintains the row for that patient with the most data. For example I started with this:

 

proc sql;
create table table_two
as select distinct  patientid condition diagnosis DateofBirth sex address city zip county from master_file
quit;
run;

 

I want one row for every patientid, but if one row is complete (or more complete) for all the variables and another is missing all the demographics for example, I would want to keep the row with the most information.

 

Any guidance on the best approach is greatly appreciated.

2 REPLIES 2
Reeza
Super User

Create a completeness measure for each row and then take the one with the highest level of completeness. 

This of course assumes you cannot 'fix' the data and fill in data that's missing from other rows to form a complete/more complete row instead.

 


@Celina1 wrote:

I am wondering what the best approach is for creating a data set that has only one row for each patient, but maintains the row for that patient with the most data. For example I started with this:

 

proc sql;
create table table_two
as select distinct  patientid condition diagnosis DateofBirth sex address city zip county from master_file
quit;
run;

 

I want one row for every patientid, but if one row is complete (or more complete) for all the variables and another is missing all the demographics for example, I would want to keep the row with the most information.

 

Any guidance on the best approach is greatly appreciated.


 

ballardw
Super User

Example data and a clear definition of "most data" or "more complete" would be extremely useful for a complete answer.

 

I can easily envision a data set with 100 variable and records with 95 or more of the variables with values but not complete because one or two specific variables are missing.

Or that have many variables with the code for "not actually recorded for some reason".

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1270 views
  • 0 likes
  • 3 in conversation