BookmarkSubscribeRSS Feed
zaldarsa
Obsidian | Level 7

Hello,

I am working with a long form dataset, meaning that there are multiple observations per participant (because of multiple visits). I was trying to think of code that will allow me to include the first non missing value for each variable. For example, I want to get the first non missing value for for marital satisfaction and number of children seen per month. If the data was missing for the first visit, I would want the data from the second visit, if not then third visit and so on. Any suggestions for the most efficient way to go about doing this?  Thank you in advance for all your help!

5 REPLIES 5
Reeza
Super User
Can you provide some example data, fake data is fine, then we can assist you further.
Retain is one method, update may be another, but I suspect there are some variables that may need other logic, such as age which may need a first or last?
zaldarsa
Obsidian | Level 7

Sure. For example, say this was my dataset:

 

Study ID        Visit #            Marriage Satisfaction         Age                     Have Close Friends?

104                  1                        Missing                        8                                   Missing

104                  2                          Yes                             9                                    Yes

104                  3                           No                             Missing                          No

105                  1                           Missing                     Missing                          Missing

105                  2                           Missing                      Missing                          Missing

105                  3                          Yes                              10                                    Yes

106                  1                          Missing                        Missing                        Missing

106                  2                           Yes                             9                                      No

106                  3                           No                              10                                  Yes

 

 

So I would want to create a code the would give me the first non-missing value of these variables for each ID. For example, for ID 104 that would be visit 2 for marriage, visit 1 for age, visit 2 for having close friends.

 

s_lassen
Meteorite | Level 14

You will get better and quicker answers if you show your sample data as a data step, e.g. like this:

options missing=' ';                              
data have;                                        
  infile cards dsd delimiter=',' missover;        
  length                                          
    Study_ID 8                                    
    Visit_no 8                                    
    Marriage_satisfaction $3                      
    Age 8                                         
    Close_friends $3                              
    ;                                             
  input Study_ID--Close_friends;                  
cards;                                            
104,1,,8,,                                        
104,2,Yes,9,Yes,                                  
104,3,No,,No,                                     
105,1,,,,                                         
105,2,,,,                                         
105,3,Yes,10,Yes,                                 
106,1,,,,                                         
106,2,Yes,9,No,                                   
106,3,No,10,Yes,                                  
;run;

The problem you have can be solved with an update in place solution. But first you may want to copy your input dataset to the output dataset:

data want;   
  set have;  
run;

Then you can update the WANT dataset like this:

data want;                                            
  length                                              
    _Marriage_satisfaction $3                         
    _Age 8                                            
    _Close_friends $3                                 
    ;                                                 
  do _N_=nobs to 1 by -1;                             
    modify want point=_N_ nobs=nobs;                  
    first_ID=Study_ID ne lag(Study_ID);               
    if not first_ID then do;                          
      if missing(Marriage_satisfaction) then          
        Marriage_satisfaction=_Marriage_satisfaction; 
      if missing(age) then                            
        age=_age;                                     
      if missing(Close_friends) then                  
        Close_friends=_Close_friends;                 
      replace;                                        
      end;                                            
    _Marriage_satisfaction=Marriage_satisfaction;     
    _age=age;                                         
    _Close_friends=Close_friends;                     
    end;                                              
  stop;                                               
run;

So the basic idea is to run through the dataset backwards, and store the values to be updated in temporary variables (after getting the previous value if missing).

andreas_lds
Jade | Level 19

Thanks at @s_lassen for providing data in usable form 😉

 

retain + coalesce(c) could be used:

data want;
   set have;
   by Study_ID;

   length 
      _Marriage_satisfaction $ 3
      _Age 8
      _Close_friends $ 3
   ;

   retain _:;

   if first.Study_ID then do;
      call missing(of _:);
   end;

   _Marriage_satisfaction = coalescec(_Marriage_satisfaction, Marriage_satisfaction);
   _Age = coalesce(_Age, Age);
   _Close_friends = coalescec(_Close_friends, Close_friends);

   if last.Study_ID then do;
      Marriage_satisfaction = _Marriage_satisfaction;
      Age = _Age;
      Close_friends = _Close_friends;

      output;
   end;

   drop _:;
run;
Ksharp
Super User
data have;
input StudyID     Visit   Marriage $ Age  ;
cards;
104                  1                        .                        8                                   .
104                  2                          Yes                             9                                    Yes
104                  3                           No                             .                          No
105                  1                           .                     .                          .
105                  2                           .                      .                          .
105                  3                          Yes                              10                                    Yes
106                  1                          .                        .                        .
106                  2                           Yes                             9                                      No
106                  3                           No                              10  
;

data first_Marriage;
 set have(where=(Marriage is not missing));
 by StudyID;
if first.StudyID;
keep StudyID Visit Marriage;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 593 views
  • 3 likes
  • 5 in conversation