Hello!
I am working on a study that we collected 6 different admission vitals for: systolic blood pressure, diastolic blood pressure, heart rate, respiratory rate, temp, and oximetry. The data came with up to three entries per patient and I would like to keep only one row that is the most complete with all the data from that timepoint. If there are multiple rows with the same amount of values collected the first recorded is the row we would like to keep. Any ideas on how to complete only the row with that is the most complete?
Example of data
Patient ID SBP DBP HR RR TEMP OXIMETRY
1 120 19 36.6 94
1 139 103 75 20
1 122 101 98 18 36.6 94 <------ This row is the desired row to keep
2 110 19 36.6 94 <------ This row is the desired row to keep
2 109 76 20
3 102 81 18 <------ This row is the desired row to keep
3 90 82 17
Thanks for any help you can provide!
Stephanie
data have;
infile cards truncover;
input PatientID SBP DBP HR RR TEMP OXIMETRY;
cards;
1 120 . . 19 36.6 94
1 139 103 75 20 . .
1 122 101 98 18 36.6 94
2 110 . . 19 36.6 94
2 109 . 76 20 . .
3 102 81 . 18 . .
3 . 90 82 17 . .
;
data want;
do until(last.patientid);
set have;
by patientid;
k=n(of sbp--oximetry);
if first.patientid then m=k;
else m=max(m,k);
end;
do until(last.patientid);
set have;
by patientid;
if n(of sbp--oximetry)=m and not f then
do;
output;
f=1;
end;
end;
drop m k f;
run;
data have;
infile cards truncover;
input PatientID SBP DBP HR RR TEMP OXIMETRY;
cards;
1 120 . . 19 36.6 94
1 139 103 75 20 . .
1 122 101 98 18 36.6 94
2 110 . . 19 36.6 94
2 109 . 76 20 . .
3 102 81 . 18 . .
3 . 90 82 17 . .
;
data want;
do until(last.patientid);
set have;
by patientid;
k=n(of sbp--oximetry);
if first.patientid then m=k;
else m=max(m,k);
end;
do until(last.patientid);
set have;
by patientid;
if n(of sbp--oximetry)=m and not f then
do;
output;
f=1;
end;
end;
drop m k f;
run;
data have;
input Patient_ID $ SBP DBP HR RR TEMP OXIMETRY;
datalines;
1 120 . . 19 36.2 94
1 139 103 75 20 . .
1 122 101 98 18 36.6 94
2 110 . . 19 36.6 94
2 110 . . 76 20 . .
3 102 81 . 18 . .
3 . 90 82 17 . .
;
run;
data tmp;
set have;
cnt=nmiss(SBP,DBP,HR,RR,TEMP,OXIMETRY);
run;
proc sort data= tmp out=tmp2 ;
by Patient_ID cnt;
run;
data want;
set tmp2;
if first.Patient_ID;
by Patient_ID;
drop cnt;
run;
This worked with retaining the row with the most responses, but did not always keep the first entry when multiple rows contained the same amount of missing variables.
Thank you for your reply and help!
I suspect you want to use
by patient_id DESCENDING cnt;
This is a case where you want to use the N function (counts the number of non-missing numeric values among the arguments of N). Luckily your variables of interest are all numeric. Then just reread an output the observation with the first instance of maximum N function value for a given ID, which in your case benefits from the use of the SET statement with the POINT= option:
data have;
input ID SBP DBP HR RR TEMP OXIMETRY;
datalines;
1 120 . . 19 36.6 94
1 139 103 75 20 . .
1 122 101 98 18 36.6 94
2 110 . . 19 36.6 94
2 109 . 76 20 . .
3 102 81 . 18 . .
3 . 90 82 17 . .
run;
data want (drop=max_:);
set have;
by id;
retain max_n max_ptr;
if first.id then call missing(max_n,max_ptr);
if n(sbp,dbp,hr,rr,temp,oximetry)>max_n then do;
max_n=n(sbp,dbp,hr,rr,temp,oximetry);
max_ptr=_n_;
end;
if last.id;
set have point=max_ptr;
run;
This worked as well. I can't make all the replies a solution.
Thanks for all the help, very much appreciated!!
Now to open up another can of worms ...
Suppose one visit was the only one that contained a measurement for SBP. But that wasn't the visit with the most data. Would you want to keep SBP from that visit?
There are easy ways to do this ... the program could actually be shorter than the solutions you have so far.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.