I have a dataset where participants have up to 4 visits. I need to select their most recent visit for multiple variables that does not include missing values. I know I can use last.studyid to select their most recent observation but is there a way for multiple variables to select their most recent observation for each variable that does not include missing values? A short example of what the data might look like it below. Bascially for var1 I would want the third visit value for var1 and the fourth visit value for var2 for study ID 1, for study id 2 I would want the third visit value for both, and for study ID 3 I would want the fourth visit value for var1 and the third visit value for var2. Any help would be great appreciated.
ID var1 var2 visit
1 1 0 1
1 1 0 2
1 1 0 3
1 . 0 4
2 . 0 1
2 1 0 2
2 1 1 3
3 0 1 1
3 0 1 2
3 0 1 3
3 1 . 4
This will get the result you want but you only see the last visit, not where the info comes from. Let me know if this helps:
data have;
input ID$ var1 var2 visit;
cards;
1 1 0 1
1 1 0 2
1 1 0 3
1 . 0 4
2 . 0 1
2 1 0 2
2 1 1 3
3 0 1 1
3 0 1 2
3 0 1 3
3 1 . 4
;run;
data want;
set have;
by id visit;
retain _newvar1 _newvar2;
if first.id then do;
_newvar1 = var1;
_newvar2 = var2;
end;
if missing(var1) then var1 = _newvar1;
if missing(var2) then var2 = _newvar2;
drop _:;
if last.id then output;
run;
This will get the result you want but you only see the last visit, not where the info comes from. Let me know if this helps:
data have;
input ID$ var1 var2 visit;
cards;
1 1 0 1
1 1 0 2
1 1 0 3
1 . 0 4
2 . 0 1
2 1 0 2
2 1 1 3
3 0 1 1
3 0 1 2
3 0 1 3
3 1 . 4
;run;
data want;
set have;
by id visit;
retain _newvar1 _newvar2;
if first.id then do;
_newvar1 = var1;
_newvar2 = var2;
end;
if missing(var1) then var1 = _newvar1;
if missing(var2) then var2 = _newvar2;
drop _:;
if last.id then output;
run;
A simple DATA step tool can make this easy:
data want;
update have (obs=0) have;
by id;
drop visit;
run;
You don't need to know the variable names other than ID and VISIT.
Thank you both for your replies. Astounding-Can you give me a little more information as to what the obs=0 does in this case? From reading the SAS documentation it tells SAS when to stop processing. So in this case, if a missing value is set to obs=0 then will it choose the previous value? Also, why would I drop visit?
Mark-I am going to try your suggestion and see if that works as well. In your case it appears as though I will fill the missing with the first observations value, what if that is missing as well? Thanks!
VISIT does not have to be dropped ... but there will only be one observation per ID in the output. So VISIT will represent the last VISIT value for that ID (whether or not any other variables were actually non-missing on that visit). If that would be helpful, keep it.
UPDATE requires two data sets ... a master (with no more than one observation per ID) and a set of transactions (with potentially many observations per ID). So the first mention of HAVE with OBS=0 is just to satisfy the requirements of the UPDATE statement. It doesn't bring in any variable values, but it does let UPDATE process the second mention of HAVE. UPDATE applies all of the transactions, ignores missing values, and automatically outputs a single observation per ID. In this case, that's just what the doctor ordered.
Comment out the drop _:; line and you'll see what you are getting. You aren't getting the first value but the last value that is not missing.
Thank you both for your help. They both seem to have worked and give me the same results.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.