- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So I am writing a code and I seem to be stuck. I have been searching through google and haven't been able to find an answer. So I am hoping someone here can advise me.
I am trying to pick the last non-missing value.
This is my data
data has;
input ID RX Time score;
datalines;
1 1 6 10
1 1 12 50
1 1 15 60
2 1 6 10
2 1 12 50
2 1 15 .
;
I want to subset the data to pick only last time point that does not have a missing score. So for ID 1, I want to pick the record at time 15 since it is the last non-missing score. and for ID 2 i want to pick the record with time 12 since the score is missing at time 15.
The code I have come up with so far can only pick the last ID if it is not missing:
PROC SORT DATA=has;
BY ID Time ;
RUN;
DATA want;
SET has;
BY ID Time ;
if last.id and score NE . then select=1;
RUN;
What else do I have to do to get my program to work?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
try sorting the data by descending time so the non missing one is at the top:
proc sort data=has out=want;
by id descending time;
where score ne .;
run;
data want;
set want;
by id;
if first.id then output;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
try sorting the data by descending time so the non missing one is at the top:
proc sort data=has out=want;
by id descending time;
where score ne .;
run;
data want;
set want;
by id;
if first.id then output;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You might try this:
proc sort data=has;
by id descending time;
run;
data want;
set has;
by id;
retain select;
if first.id then select = 1;
if select = 1 and score ne .
then do;
select = 0;
output;
end;
run;
RW9 beat me with a more elegant solution!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If your data are already sorted than you could just use:
data want;
set has (where=(not missing(score)));
by id;
if last.id;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Using WHERE instead of IF will make the LAST.ID variable work properly.
data want; set have; by id; where score is not null ; if last.id; run;
What do you want to do if all SCORE values for a particular ID are missing? If you want them eliminated then that will do it.
If you want to keep them then you might need to work a little harder.
data want ;
do _n_=1 by 1 until(last.id); set have; by id; if not missing(score) then keep=_n_ ; end;
keep = min(keep,_n_);
do _n_=1 by 1 until(last.id); set have; by id; if _n_=keep then output; end;
run;