## Modifying observation based on ID and test date

Solved
Occasional Contributor
Posts: 15

# Modifying observation based on ID and test date

Hello! I am working on a dataset that contains patient ID numbers, biological sex (M, F, or U for unidentified), and a test date. I am looking to change all gender entries to the one on the most recent test date that corresponds with the patient ID. Some patients have been previously entered as U but have more recent entries that are M or F. However, in some cases there are both M and F entries associated with the ID but I want to assume the most recent test date is the correct gender. How can I code this?

I would like to have the most possible Sex entries as M and F, with the most recent test date indicating the assumed truth. Some will have the most recent test date as U, therefore I would like to use the second most recent in those cases.

An example:

Input ID TESTDATE SEX;
datalines;
1 2010/01/01 M

1 2012/02/07 F

2 2009/05/07 F

2 2012/06/09 U

2 2007/12/12 F

3 2008/01/11 U

3 2010/08/09 M

4 2009/09/09 M

4 2012/09/03 M

;

Thank you!

Accepted Solutions
Solution
‎05-23-2017 05:42 PM
Super User
Posts: 6,634

## Re: Modifying observation based on ID and test date

This approach assumes that your data set is already in sorted order by ID and TESTDATE:

data want;

do until (last.id);

set have;

by id;

if sex in ('M', 'F') then most_recent = sex;

end;

do until (last.id);

set have;

by id;

if most_recent in ('M', 'F') then sex = most_recent;

output;

end;

drop most_recent;

run;

It might be overkill, but the program is careful to leave SEX unchanged when an ID has no values that are either 'M' or 'F'.

All Replies
Solution
‎05-23-2017 05:42 PM
Super User
Posts: 6,634

## Re: Modifying observation based on ID and test date

This approach assumes that your data set is already in sorted order by ID and TESTDATE:

data want;

do until (last.id);

set have;

by id;

if sex in ('M', 'F') then most_recent = sex;

end;

do until (last.id);

set have;

by id;

if most_recent in ('M', 'F') then sex = most_recent;

output;

end;

drop most_recent;

run;

It might be overkill, but the program is careful to leave SEX unchanged when an ID has no values that are either 'M' or 'F'.

Occasional Contributor
Posts: 15