Obsidian | Level 7

## how to select top 3 observations based on 2 conditions

Hello! I need to calculate the median VF value (my outcome outcome) for each visit (date) for each Patient_ID.

Here is an example of the data:

Patient_ID        Date               VF

1                       08AUG18       6.0

1                       08AUG18       6.6

1                       08AUG18       4.2

1                       08AUG18       6.8

1                       22SEP18       6.9

1                       22SEP18       7.7

1                       22SEP18       6.3

1                       22SEP18       6.8

2                       08AUG18      7.9

2                       08AUG18      7.3

2                       08AUG18      7.4

2                       29OCT18      7.2

2                       29OCT18      7.2

3                       ....etc

I only want to calculate the median when there are at least observations (ex. I do not want to calculate the median for ID #2 who only had 2 measures on 29OCT18. Therefore, I am trying to select the top 3 observations for each date and ID. Note: different ID's may have VF measures done on the same date so I cannot just select the top 3 observations for each date.

I know how to select the top 3 observations for a given ID, but do not how to select the top 3 observations for each date for each person.

Can anyone help with this?

1 ACCEPTED SOLUTION

Accepted Solutions
Quartz | Level 8

## Re: how to select top 3 observations based on 2 conditions

Here's what I came up with. It sorts by VF descending in the proc sort, then a data step with a by statement. It basically establishes the "median" when the cnt = 2 and retains that until the last iteration for that ID & date and outputs it.

proc sort data=have2;
by patient_id date2 descending VF;
run;
data want (drop=cnt vf);
set have2 (rename=(date2=date));
retain cnt median;
by patient_id date;
if first.date then do;
cnt=1;
median=.;
end;
else cnt=cnt+1;
if cnt=2 then median=VF;
if last.date and cnt ge 3 then output;
run;
10 REPLIES 10
Tourmaline | Level 20

## Re: how to select top 3 observations based on 2 conditions

So what does your desired result look like?

Do you still want the 29OCT18 for ID #2 to appear in your final data set even though you dont want to calculate the median based on them?

Obsidian | Level 7

## Re: how to select top 3 observations based on 2 conditions

I don't still need the 29OCT18 to appear. This is what I would like to get:
ID Date Median VF
1 08AUG18 6.0
1 22SEP18 6.3
2 08AUG18 7.4
3 ... etc
Pyrite | Level 9

## Re: how to select top 3 observations based on 2 conditions

6.0 is not the median of the top 3 observations for ID 1 on 8/8/18. the top 3 are 6, 6.6, and 6.8. Are you referring to the original sort order when you say "top 3" rather than the nominal value of VF?

If that's the case, then use data step + By statement:

data dat;
set yourdataset;
by patientid;
if first.patientid then id_count=1;
else id_count + 1;
run;

You can use that to then determine if something has at least 3 observations (do a max of id_count, or a count of distinct combination of patientid and date and join back to the DAT data set), and then keep id_count=2 for each of the patient id's that appears at least 3 times.

Pyrite | Level 9

## Re: how to select top 3 observations based on 2 conditions

I would count the instances for a given patient_id and date, join to this data, and delete anything less than your threshold.

eg,

proc sql;
create table counts as select patient_id,date,n(VF) as count
from yourdataset
group by patient_id,date
;
quit;

Join back on patient_id and date, and then eliminate the combinations for which you do not want the median.

Alternately, in a single step, you could use proc means to calculate both the median and count statistics and then in a subsequent step get rid of the medians you don't want.

Super User

## Re: how to select top 3 observations based on 2 conditions

How are you defining top 3?

It seems weird to take the highest measurements when I suspect time is more important with this type of data.

@abc44 wrote:

Hello! I need to calculate the median VF value (my outcome outcome) for each visit (date) for each Patient_ID.

Here is an example of the data:

Patient_ID        Date               VF

1                       08AUG18       6.0

1                       08AUG18       6.6

1                       08AUG18       4.2

1                       08AUG18       6.8

1                       22SEP18       6.9

1                       22SEP18       7.7

1                       22SEP18       6.3

1                       22SEP18       6.8

2                       08AUG18      7.9

2                       08AUG18      7.3

2                       08AUG18      7.4

2                       29OCT18      7.2

2                       29OCT18      7.2

3                       ....etc

I only want to calculate the median when there are at least observations (ex. I do not want to calculate the median for ID #2 who only had 2 measures on 29OCT18. Therefore, I am trying to select the top 3 observations for each date and ID. Note: different ID's may have VF measures done on the same date so I cannot just select the top 3 observations for each date.

I know how to select the top 3 observations for a given ID, but do not how to select the top 3 observations for each date for each person.

Can anyone help with this?

Obsidian | Level 7

## Re: how to select top 3 observations based on 2 conditions

Sorry that I was not clear about that. I want to calculate the median of the first 3 measures on each date.

Super User

## Re: how to select top 3 observations based on 2 conditions

Add an enumeration variable then and keep only ones that are less than 3.

https://stats.idre.ucla.edu/sas/faq/how-can-i-create-an-enumeration-variable-by-groups/

@abc44 wrote:

Sorry that I was not clear about that. I want to calculate the median of the first 3 measures on each date.

Pyrite | Level 9

## Re: how to select top 3 observations based on 2 conditions

You want the value of the 2nd highest then? You could use PROC RANK, or a sort + data step with BY statement.

Quartz | Level 8

## Re: how to select top 3 observations based on 2 conditions

Here's what I came up with. It sorts by VF descending in the proc sort, then a data step with a by statement. It basically establishes the "median" when the cnt = 2 and retains that until the last iteration for that ID & date and outputs it.

proc sort data=have2;
by patient_id date2 descending VF;
run;
data want (drop=cnt vf);
set have2 (rename=(date2=date));
retain cnt median;
by patient_id date;
if first.date then do;
cnt=1;
median=.;
end;
else cnt=cnt+1;
if cnt=2 then median=VF;
if last.date and cnt ge 3 then output;
run;
Obsidian | Level 7

## Re: how to select top 3 observations based on 2 conditions

This worked perfectly! Thank you so much!
Discussion stats
• 10 replies
• 915 views
• 3 likes
• 5 in conversation