BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sjarvis847
Fluorite | Level 6

Hello! 

 

I am working on a study that we collected 6 different admission vitals for: systolic blood pressure, diastolic blood pressure, heart rate, respiratory rate, temp, and oximetry. The data came with up to three entries per patient and I would like to keep only one row that is the most complete with all the data from that timepoint. If there are multiple rows with the same amount of values collected the first recorded is the row we would like to keep. Any ideas on how to complete only the row with that is the most complete?

 

Example of data 

 

Patient ID      SBP      DBP     HR     RR    TEMP    OXIMETRY

1                    120                              19        36.6      94

1                    139        103      75       20      

1                    122        101      98       18        36.6      94               <------ This row is the desired row to keep

2                    110                               19        36.6      94               <------ This row is the desired row to keep

2                    109                    76       20                     

3                    102        81                 18                                        <------ This row is the desired row to keep

3                                  90      82       17      

 

Thanks for any help you can provide! 

 

Stephanie

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

data have;
infile cards truncover;
input PatientID      SBP      DBP     HR     RR    TEMP    OXIMETRY;
cards;
1                    120           . .                  19        36.6      94
1                    139        103      75       20     . . 
1                    122        101      98       18        36.6      94              
2                    110        . .                      19        36.6      94           
2                    109         .           76       20    . .                 
3                    102        81         .        18   . .                                    
3                      .            90      82       17   . .
;
data want;
do until(last.patientid);
set have;
by patientid;
k=n(of sbp--oximetry);
if first.patientid then m=k;
else m=max(m,k);
end;
do until(last.patientid);
set have;
by patientid;
if n(of sbp--oximetry)=m and not f then 
do;
output;
f=1;
end;
end;
drop m k f;
run;

 

View solution in original post

7 REPLIES 7
novinosrin
Tourmaline | Level 20

data have;
infile cards truncover;
input PatientID      SBP      DBP     HR     RR    TEMP    OXIMETRY;
cards;
1                    120           . .                  19        36.6      94
1                    139        103      75       20     . . 
1                    122        101      98       18        36.6      94              
2                    110        . .                      19        36.6      94           
2                    109         .           76       20    . .                 
3                    102        81         .        18   . .                                    
3                      .            90      82       17   . .
;
data want;
do until(last.patientid);
set have;
by patientid;
k=n(of sbp--oximetry);
if first.patientid then m=k;
else m=max(m,k);
end;
do until(last.patientid);
set have;
by patientid;
if n(of sbp--oximetry)=m and not f then 
do;
output;
f=1;
end;
end;
drop m k f;
run;

 

r_behata
Barite | Level 11
data have;
input Patient_ID $ SBP DBP HR RR TEMP  OXIMETRY;
datalines;
1 120 . . 19 36.2 94
1 139 103 75 20 . .
1 122 101 98 18 36.6 94
2 110 . . 19 36.6 94
2 110 . . 76 20 . .
3 102 81 . 18 . .
3 . 90 82 17 . .
;
run;

data tmp;
	set have;
	cnt=nmiss(SBP,DBP,HR,RR,TEMP,OXIMETRY);
run;

proc sort data= tmp out=tmp2 ;
	by Patient_ID cnt;
run;

data want;
	set tmp2;
	if first.Patient_ID;

	by Patient_ID;

	drop cnt;
run;
sjarvis847
Fluorite | Level 6

This worked with retaining the row with the most responses, but did not always keep the first entry when multiple rows contained the same amount of missing variables.

 

Thank you for your reply and help! 

mkeintz
PROC Star

@r_behata

 

I suspect you want to use

  by patient_id DESCENDING cnt;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

This is a case where you want to use the N function (counts the number of non-missing numeric values among the arguments of N).  Luckily your variables of interest are all numeric.  Then just reread an output the observation with the first instance of maximum N function value for a given ID, which in your case benefits from the use of the SET statement with the POINT= option:

 

data have;
  input ID      SBP      DBP     HR     RR    TEMP    OXIMETRY;
datalines;
1 120        .       .    19    36.6      94
1 139      103      75    20      .        .
1 122      101      98    18    36.6      94
2 110        .       .    19    36.6      94
2 109        .      76    20      .        .
3 102       81       .    18      .        .
3   .       90      82    17     .         .
run;
data want (drop=max_:);
  set have;
  by id;
  retain max_n max_ptr;
  if first.id then call missing(max_n,max_ptr);

  if n(sbp,dbp,hr,rr,temp,oximetry)>max_n then do;
    max_n=n(sbp,dbp,hr,rr,temp,oximetry);
    max_ptr=_n_;
  end;
  if last.id;
  set have point=max_ptr;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
sjarvis847
Fluorite | Level 6

This worked as well. I can't make all the replies a solution. 

 

Thanks for all the help, very much appreciated!! 

 

Astounding
PROC Star

Now to open up another can of worms ...

 

Suppose one visit was the only one that contained a measurement for SBP.  But that wasn't the visit with the most data.  Would you want to keep SBP from that visit?

 

There are easy ways to do this ... the program could actually be shorter than the solutions you have so far.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1438 views
  • 4 likes
  • 5 in conversation