BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
abc44
Obsidian | Level 7

Hello! I need to calculate the median VF value (my outcome outcome) for each visit (date) for each Patient_ID. 

 

Here is an example of the data: 

 

Patient_ID        Date               VF

1                       08AUG18       6.0

1                       08AUG18       6.6

1                       08AUG18       4.2

1                       08AUG18       6.8

1                       22SEP18       6.9

1                       22SEP18       7.7

1                       22SEP18       6.3 

1                       22SEP18       6.8 

2                       08AUG18      7.9 

2                       08AUG18      7.3

2                       08AUG18      7.4

2                       29OCT18      7.2

2                       29OCT18      7.2

3                       ....etc

 

I only want to calculate the median when there are at least observations (ex. I do not want to calculate the median for ID #2 who only had 2 measures on 29OCT18. Therefore, I am trying to select the top 3 observations for each date and ID. Note: different ID's may have VF measures done on the same date so I cannot just select the top 3 observations for each date.

 

I know how to select the top 3 observations for a given ID, but do not how to select the top 3 observations for each date for each person. 

 

Can anyone help with this?

 

1 ACCEPTED SOLUTION

Accepted Solutions
bobpep212
Quartz | Level 8

Here's what I came up with. It sorts by VF descending in the proc sort, then a data step with a by statement. It basically establishes the "median" when the cnt = 2 and retains that until the last iteration for that ID & date and outputs it. 

 

proc sort data=have2;
by patient_id date2 descending VF;
run;
data want (drop=cnt vf);
set have2 (rename=(date2=date));
retain cnt median;
by patient_id date;
if first.date then do; 
	cnt=1;
	median=.;
				end;
else cnt=cnt+1;
if cnt=2 then median=VF;
if last.date and cnt ge 3 then output;
run;

View solution in original post

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20

So what does your desired result look like?

 

Do you still want the 29OCT18 for ID #2 to appear in your final data set even though you dont want to calculate the median based on them?

abc44
Obsidian | Level 7
I don't still need the 29OCT18 to appear. This is what I would like to get:
ID Date Median VF
1 08AUG18 6.0
1 22SEP18 6.3
2 08AUG18 7.4
3 ... etc
cau83
Pyrite | Level 9

6.0 is not the median of the top 3 observations for ID 1 on 8/8/18. the top 3 are 6, 6.6, and 6.8. Are you referring to the original sort order when you say "top 3" rather than the nominal value of VF?

 

If that's the case, then use data step + By statement:

data dat;
   set yourdataset;
   by patientid;
   if first.patientid then id_count=1;
   else id_count + 1;
 run;

You can use that to then determine if something has at least 3 observations (do a max of id_count, or a count of distinct combination of patientid and date and join back to the DAT data set), and then keep id_count=2 for each of the patient id's that appears at least 3 times.

cau83
Pyrite | Level 9

 

I would count the instances for a given patient_id and date, join to this data, and delete anything less than your threshold.

eg,

proc sql;
    create table counts as select patient_id,date,n(VF) as count
         from yourdataset
             group by patient_id,date
    ;
quit;

Join back on patient_id and date, and then eliminate the combinations for which you do not want the median.

 

Alternately, in a single step, you could use proc means to calculate both the median and count statistics and then in a subsequent step get rid of the medians you don't want.

Reeza
Super User

How are you defining top 3?

 

It seems weird to take the highest measurements when I suspect time is more important with this type of data.

 


@abc44 wrote:

Hello! I need to calculate the median VF value (my outcome outcome) for each visit (date) for each Patient_ID. 

 

Here is an example of the data: 

 

Patient_ID        Date               VF

1                       08AUG18       6.0

1                       08AUG18       6.6

1                       08AUG18       4.2

1                       08AUG18       6.8

1                       22SEP18       6.9

1                       22SEP18       7.7

1                       22SEP18       6.3 

1                       22SEP18       6.8 

2                       08AUG18      7.9 

2                       08AUG18      7.3

2                       08AUG18      7.4

2                       29OCT18      7.2

2                       29OCT18      7.2

3                       ....etc

 

I only want to calculate the median when there are at least observations (ex. I do not want to calculate the median for ID #2 who only had 2 measures on 29OCT18. Therefore, I am trying to select the top 3 observations for each date and ID. Note: different ID's may have VF measures done on the same date so I cannot just select the top 3 observations for each date.

 

I know how to select the top 3 observations for a given ID, but do not how to select the top 3 observations for each date for each person. 

 

Can anyone help with this?

 


 

abc44
Obsidian | Level 7

Sorry that I was not clear about that. I want to calculate the median of the first 3 measures on each date. 

Reeza
Super User

Add an enumeration variable then and keep only ones that are less than 3. 

 

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

 


@abc44 wrote:

Sorry that I was not clear about that. I want to calculate the median of the first 3 measures on each date. 


 

cau83
Pyrite | Level 9

You want the value of the 2nd highest then? You could use PROC RANK, or a sort + data step with BY statement.

bobpep212
Quartz | Level 8

Here's what I came up with. It sorts by VF descending in the proc sort, then a data step with a by statement. It basically establishes the "median" when the cnt = 2 and retains that until the last iteration for that ID & date and outputs it. 

 

proc sort data=have2;
by patient_id date2 descending VF;
run;
data want (drop=cnt vf);
set have2 (rename=(date2=date));
retain cnt median;
by patient_id date;
if first.date then do; 
	cnt=1;
	median=.;
				end;
else cnt=cnt+1;
if cnt=2 then median=VF;
if last.date and cnt ge 3 then output;
run;
abc44
Obsidian | Level 7
This worked perfectly! Thank you so much!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 845 views
  • 3 likes
  • 5 in conversation