BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LoPez_Diaz
Obsidian | Level 7

Hello , 

I have a huge dataset , with over 20k rows , and 12 columns 

I have some missing values for a column T_X

My dataset has the following format : 

 

DATE_S   | DATE_E  |  DATE_R   | MA  |       T_X         

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |          .  

01/02/19  | 01/08/19  | 01/02/34     |  2    |          .

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |          .

01/02/19  | 01/08/19  | 01/02/34     |  2    |          .

01/02/19  | 01/06/19  | 01/07/22     |  7    |     M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |     M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |          .

01/02/19  | 01/06/19  | 01/07/22     |  7    |     M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |          .

 

So what I want to do , is to replace the missing values in column T_X with the value from the same column having the same row values as other rows that have a value in the column T_X;

 

The data I want to get should look like this : 

 

DATE_S   | DATE_E  |  DATE_R   | MA  |       T_X         

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

 

Thank you in advance for your responses.

 

1 ACCEPTED SOLUTION

Accepted Solutions
hashman
Ammonite | Level 13

@LoPez_Diaz:

Assuming that T_X is always not missing in the first record of each DATE_: group, it doesn't get simpler than this: 

data have ;                                              
  infile cards dlm = "|" ;                               
  informat date_s date_e date_r mmddyy08. ma 8. t_x $8. ;
  format   date_s date_e date_r yymmdd10. ;              
  input (_all_) (:) ;                                    
  cards ;                                                
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
;                                                        
run ;                                                    
                                                         
data want (drop = _:) ;                                  
  set have ;                                             
  retain _t ;                                            
  if cmiss (t_x) then t_x = _t ;                         
  else _t = t_x ;                                        
run ;                                                    

However, if you may have a missing T_X anywhere in the DATE_: group and want to fill the whole group with whatever non-missing value is present in the group (if it's present):

data want ;                                                                   
  merge have (drop=t_x in=h1) have (keep=date_: t_x where=(t_x is not null)) ;
  by date_s descending date_e descending date_r ;                             
  if h1 ;                                                                     
run ;                                                                         

OTOH, if your input data set is not sorted or grouped by DATE_: and/or you don't want to rely on any kind of DATE_: order, the hash object comes in handy:

data want ;                                               
  if _n_ = 1 then do ;                                    
    dcl hash h (dataset:"have(where=(t_x is not null))") ;
    h.definekey ("date_s", "date_e", "date_r") ;          
    h.definedata ("t_x") ;                                
    h.definedone () ;                                     
  end ;                                                   
  set have ;                                              
  h.find() ;                                              
run ;                                                     

Kind regards

Paul D.

View solution in original post

3 REPLIES 3
Jagadishkatam
Amethyst | Level 16

Please try the below untested code

 

 

proc sort data=have;
by date_s date_e date_r ma descending t_x;
run;

data want;
set have(rename=(t_x=_t_x_));
by date_s date_e date_r ma descending t_x;
retain t_x;
if first.ma then t_x='';
if _t_x_ ne '' then t_x=_t_x_;
run;

Thanks,
Jag
hashman
Ammonite | Level 13

@LoPez_Diaz:

Assuming that T_X is always not missing in the first record of each DATE_: group, it doesn't get simpler than this: 

data have ;                                              
  infile cards dlm = "|" ;                               
  informat date_s date_e date_r mmddyy08. ma 8. t_x $8. ;
  format   date_s date_e date_r yymmdd10. ;              
  input (_all_) (:) ;                                    
  cards ;                                                
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
;                                                        
run ;                                                    
                                                         
data want (drop = _:) ;                                  
  set have ;                                             
  retain _t ;                                            
  if cmiss (t_x) then t_x = _t ;                         
  else _t = t_x ;                                        
run ;                                                    

However, if you may have a missing T_X anywhere in the DATE_: group and want to fill the whole group with whatever non-missing value is present in the group (if it's present):

data want ;                                                                   
  merge have (drop=t_x in=h1) have (keep=date_: t_x where=(t_x is not null)) ;
  by date_s descending date_e descending date_r ;                             
  if h1 ;                                                                     
run ;                                                                         

OTOH, if your input data set is not sorted or grouped by DATE_: and/or you don't want to rely on any kind of DATE_: order, the hash object comes in handy:

data want ;                                               
  if _n_ = 1 then do ;                                    
    dcl hash h (dataset:"have(where=(t_x is not null))") ;
    h.definekey ("date_s", "date_e", "date_r") ;          
    h.definedata ("t_x") ;                                
    h.definedone () ;                                     
  end ;                                                   
  set have ;                                              
  h.find() ;                                              
run ;                                                     

Kind regards

Paul D.

LoPez_Diaz
Obsidian | Level 7

Hello Sir, 

Thank you very much for your detailled response , I really appreciate your help .
Thank you !

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2448 views
  • 2 likes
  • 3 in conversation