DATA Step, Macro, Functions and more

Select most recent obs without missing data

Accepted Solution Solved
Reply
Highlighted
Frequent Contributor
Frequent Contributor
Posts: 118
Accepted Solution

Select most recent obs without missing data

[ Edited ]

This seems like an easy one, but I am getting stumped. The scenario is that I have long data (multiple observations for multiple individuals). I also have a count variable for each unique person which orders the number of obs per unique person by when it was collected. Lastly, I have three variables with marginal missingness.

 

I want to keep a single OBs for each individual, though the observation will be for the largest count value without any missiness for the 3 variables. 

 

ID count X1 X2 X3

1 1 98 89 00

1 2 9 9 8

1 3 0 . .

2 1 . 9 8

2 2 1 4 5

3 1 4 5 44

3 2 4 5 2

3 3 22 3 5

3 4 . 6 2 34

4 1 . 6 4

 

Desired output

1 2 9 9 8

2 2 1 4 5

3 3 22 3 5

 

 

 

Let me know if you need any more information. I have failed using nodupkey and I am not versed enough to tackle it easily in proc sql.

 

-Thanks!


Accepted Solutions
Solution
2 weeks ago
Super User
Posts: 6,018

Re: Select most recent obs without missing data

Assuming your data set is in order by ID COUNT:

 

data want;

set have;

where n(x1, x2, x3) = 3;

by id;

if last.id;

run;

View solution in original post


All Replies
Super User
Posts: 21,596

Re: Select most recent obs without missing data

I cant follow the data at all. 

Are those desired results supposed to align with the first set of data you’ve shown?

 

If you variables are all numeric you can use NMISS to count the number of missing to determine if your ant to keep the row.

 

if nmiss(of x1-x3) > 0 then delete;


H wrote:

This seems like an easy one, but I am getting stumped. The scenario is that I have long data (multiple observations for multiple individuals). I also have a count variable for each unique person which orders the number of obs per unique person by when it was collected. Lastly, I have three variables with marginal missingness.

 

I want to keep a single OBs for each individual, though the observation will be for the largest count value without any missiness for the 3 variables. 

 

ID count X1 X2 X3

1 1 98 89 00

1 2 9 9 8

1 3 0 . .

2 1 . 9 8

 

Let me know if you need any more information. I have failed using nodupkey and I am not versed enough to tackle it easily in proc sql.

 

-Thanks!

2 2 1 4 5

3 1 4 5 44

3 2 4 5 2

3 3 22 3 5

3 4 . 6 2 34

 

Desired output

1 2 9 9 8

2 2 1 4 5

3 3 22 3 5


 

Frequent Contributor
Frequent Contributor
Posts: 118

Re: Select most recent obs without missing data

I accidently inserted my final text right into the middle of the post messing up the toy example, please refer back to the corrected version.

 

SORRY, I was fixing it when you replied.

 

Solution
2 weeks ago
Super User
Posts: 6,018

Re: Select most recent obs without missing data

Assuming your data set is in order by ID COUNT:

 

data want;

set have;

where n(x1, x2, x3) = 3;

by id;

if last.id;

run;

Super User
Posts: 12,148

Re: Select most recent obs without missing data

If I understand here is one way:

 

data have;
input ID count X1 X2 X3;
datalines;
1 1 98 89 00
1 2 9 9 8
1 3 0 . .
2 1 . 9 8
2 2 1 4 5
3 1 4 5 44
3 2 4 5 2
3 3 22 3 5
3 4 . 6 2 34
4 1 . 6 4
;
run;

proc sort data=have;
   by id descending count;
run;
data want;
   set have;
   retain flag;
   by id;
   if first.id then flag=0;
   if cmiss(x1,x2,x3) = 0 and flag=0 then do;
      flag=1;
      output;
   end;
   drop flag;
run;

Please note use of data step to show example data. If you provide such it is much easier to test code and is more critical when the type of your variables is questionable.

 

Also post code into code box opened using the forum {I} menu icon to maintain text layout as the message window often reformats pasted text.

PROC Star
Posts: 835

Re: Select most recent obs without missing data

data have;
input ID count X1 X2 X3;
datalines;
1 1 98 89 00
1 2 9 9 8
1 3 0 . .
2 1 . 9 8
2 2 1 4 5
3 1 4 5 44
3 2 4 5 2
3 3 22 3 5
3 4 . 6 2 34
4 1 . 6 4
;
run;

proc sql;
create table want as
select *
from have
where n(x1, x2, x3) = 3
group by id
having count=max(count);
quit;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 96 views
  • 2 likes
  • 5 in conversation