Choose last non-missing observation for multiple variables

Accepted Solution Solved
Reply
Regular Contributor
Posts: 150
Accepted Solution

Choose last non-missing observation for multiple variables

I have a dataset where participants have up to 4 visits. I need to select their most recent visit for multiple variables that does not include missing values. I know I can use last.studyid to select their most recent observation but is there a way for multiple variables to select their most recent observation for each variable that does not include missing values? A short example of what the data might look like it below. Bascially for var1 I would want the third visit value for var1 and the fourth visit value for var2 for study ID 1, for study id 2 I would want the third visit value for both, and for study ID 3 I would want the fourth visit value for var1 and the third visit value for var2. Any help would be great appreciated.

ID var1 var2  visit

1   1     0     1

1   1     0     2

1   1     0     3

1   .      0    4

2   .      0    1

2  1      0    2

2  1      1    3

3  0      1    1

3  0      1    2

3  0      1    3

3  1      .    4


Accepted Solutions
Solution
‎06-18-2015 11:57 AM
Valued Guide
Posts: 858

Re: Choose last non-missing observation for multiple variables

This will get the result you want but you only see the last visit, not where the info comes from.  Let me know if this helps:

data have;

input ID$ var1 var2  visit;

cards;

1  1     0    1

1  1     0    2

1  1     0    3

1  .     0    4

2  .     0    1

2  1     0    2

2  1     1    3

3  0     1    1

3  0     1    2

3  0     1    3

3  1     .    4

;run;

data want;

set have;

by id visit;

retain _newvar1 _newvar2;

if first.id then do;

_newvar1 = var1;

_newvar2 = var2;

end;

if missing(var1) then var1 = _newvar1;

if missing(var2) then var2 = _newvar2;

drop _:;

if last.id then output;

run;

View solution in original post


All Replies
Solution
‎06-18-2015 11:57 AM
Valued Guide
Posts: 858

Re: Choose last non-missing observation for multiple variables

This will get the result you want but you only see the last visit, not where the info comes from.  Let me know if this helps:

data have;

input ID$ var1 var2  visit;

cards;

1  1     0    1

1  1     0    2

1  1     0    3

1  .     0    4

2  .     0    1

2  1     0    2

2  1     1    3

3  0     1    1

3  0     1    2

3  0     1    3

3  1     .    4

;run;

data want;

set have;

by id visit;

retain _newvar1 _newvar2;

if first.id then do;

_newvar1 = var1;

_newvar2 = var2;

end;

if missing(var1) then var1 = _newvar1;

if missing(var2) then var2 = _newvar2;

drop _:;

if last.id then output;

run;

Super User
Posts: 5,372

Re: Choose last non-missing observation for multiple variables

A simple DATA step tool can make this easy:

data want;

   update have (obs=0) have;

   by id;

   drop visit;

run;

You don't need to know the variable names other than ID and VISIT.

Regular Contributor
Posts: 150

Re: Choose last non-missing observation for multiple variables

Thank you both for your replies. Astounding-Can you give me a little more information as to what the obs=0 does in this case? From reading the SAS documentation it tells SAS when to stop processing. So in this case, if a missing value is set to obs=0 then will it choose the previous value? Also, why would I drop visit?

Mark-I am going to try your suggestion and see if that works as well. In your case it appears as though I will fill the missing with the first observations value, what if that is missing as well?  Thanks!

Super User
Posts: 5,372

Re: Choose last non-missing observation for multiple variables

VISIT does not have to be dropped ... but there will only be one observation per ID in the output.  So VISIT will represent the last VISIT value for that ID (whether or not any other variables were actually non-missing on that visit).  If that would be helpful, keep it.

UPDATE requires two data sets ... a master (with no more than one observation per ID) and a set of transactions (with potentially many observations per ID).  So the first mention of HAVE with OBS=0 is just to satisfy the requirements of the UPDATE statement.  It doesn't bring in any variable values, but it does let UPDATE process the second mention of HAVE.  UPDATE applies all of the transactions, ignores missing values, and automatically outputs a single observation per ID.  In this case, that's just what the doctor ordered.

Valued Guide
Posts: 858

Re: Choose last non-missing observation for multiple variables

Comment out the drop _:; line and you'll see what you are getting.   You aren't getting the first value but the last value that is not missing.

Regular Contributor
Posts: 150

Re: Choose last non-missing observation for multiple variables

Thank you both for your help. They both seem to have worked and give me the same results.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 997 views
  • 3 likes
  • 3 in conversation