Selecting first non-missing value for each participant ID

Reply
Occasional Contributor
Posts: 5

Selecting first non-missing value for each participant ID

I have a data set containing many participants. Each participant has multiple events. For each participant and each event, I want a list of the variable value. This might be best described by demonstration:

For example,

PID     EVENT     VAR

1392     1               .

1392     1               13  

1392     2               12    

1392     2               12  

2456     1               3   

2456     1               . 

2456     2              .

2456     2               .

I want to end up with a list like this:

PID     EVENT     VAR

1392     1               13  

1392     2               12    

2456     1               3   

2456     2              .

Is there a particular method for doing this? Participants have more than one EVENT. And within each EVENT, VAR is the same (but some observations are .) In some cases, VAR is . for the entire event. In other cases, VAR is . for only a few cases within an EVENT.

Super User
Posts: 17,743

Re: Selecting first non-missing value for each participant ID

Assuming you can't have records with all missing, then remove missing at the start and then keep only the first occurrence.

proc sort data=have;

by pid event;

run;

data want;

set have;

where var ne .;

by pid event;

if first.event;

run;

Trusted Advisor
Posts: 1,204

Re: Selecting first non-missing value for each participant ID

data want;
set have;
if event=1 and var=. then delete;
run;

proc sql;
select distinct pid,event,var
from want;
quit;

New Contributor
Posts: 3

Re: Selecting first non-missing value for each participant ID

Let's call your first data example ds1 to get ds2 (second data example) try the following:

data ds2 (drop = issetflg);

     retain issetflg;

     set ds1;

     by pid event;

     if first.event then issetflg = 'N';

     if var ne . then do;

          issetflg = 'Y';

          output;

     end;

     if last.event then

          if issetflg = 'N' then

               output;

run;

Frequent Contributor
Posts: 101

Re: Selecting first non-missing value for each participant ID

Assuming the data is already sorted by pid and event, try this:

data want;

set have;

by pid event;

retain found;

if first.event then found = 0;

if found = 0 and ( not missing( var ) or last.event ) then do;

     output;

     found = 1;

end;

run;

Super Contributor
Posts: 275

Re: Selecting first non-missing value for each participant ID


proc sql;
select distinct pid,event,var from have group by pid,event having var=max(var);
quit;

Frequent Contributor
Posts: 101

Re: Selecting first non-missing value for each participant ID

Here's another approach:

data want;

set have (rename=(var=varx));

by pid event;

retain var;

if first.event then var = .;

var = coalesce( var, varx );

if last.event then output;

drop varx;

run;

Ask a Question
Discussion stats
  • 6 replies
  • 3823 views
  • 0 likes
  • 6 in conversation