I am wondering what the best approach is for creating a data set that has only one row for each patient, but maintains the row for that patient with the most data. For example I started with this:
proc sql;
create table table_two
as select distinct patientid condition diagnosis DateofBirth sex address city zip county from master_file
quit;
run;
I want one row for every patientid, but if one row is complete (or more complete) for all the variables and another is missing all the demographics for example, I would want to keep the row with the most information.
Any guidance on the best approach is greatly appreciated.
Create a completeness measure for each row and then take the one with the highest level of completeness.
This of course assumes you cannot 'fix' the data and fill in data that's missing from other rows to form a complete/more complete row instead.
@Celina1 wrote:
I am wondering what the best approach is for creating a data set that has only one row for each patient, but maintains the row for that patient with the most data. For example I started with this:
proc sql;
create table table_two
as select distinct patientid condition diagnosis DateofBirth sex address city zip county from master_file
quit;
run;
I want one row for every patientid, but if one row is complete (or more complete) for all the variables and another is missing all the demographics for example, I would want to keep the row with the most information.
Any guidance on the best approach is greatly appreciated.
Example data and a clear definition of "most data" or "more complete" would be extremely useful for a complete answer.
I can easily envision a data set with 100 variable and records with 95 or more of the variables with values but not complete because one or two specific variables are missing.
Or that have many variables with the code for "not actually recorded for some reason".
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.