BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LoPez_Diaz
Obsidian | Level 7

Hello , 

I have a huge dataset , with over 20k rows , and 12 columns 

I have some missing values for a column T_X

My dataset has the following format : 

 

DATE_S   | DATE_E  |  DATE_R   | MA  |       T_X         

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |          .  

01/02/19  | 01/08/19  | 01/02/34     |  2    |          .

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |          .

01/02/19  | 01/08/19  | 01/02/34     |  2    |          .

01/02/19  | 01/06/19  | 01/07/22     |  7    |     M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |     M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |          .

01/02/19  | 01/06/19  | 01/07/22     |  7    |     M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |          .

 

So what I want to do , is to replace the missing values in column T_X with the value from the same column having the same row values as other rows that have a value in the column T_X;

 

The data I want to get should look like this : 

 

DATE_S   | DATE_E  |  DATE_R   | MA  |       T_X         

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

 

Thank you in advance for your responses.

 

1 ACCEPTED SOLUTION

Accepted Solutions
hashman
Ammonite | Level 13

@LoPez_Diaz:

Assuming that T_X is always not missing in the first record of each DATE_: group, it doesn't get simpler than this: 

data have ;                                              
  infile cards dlm = "|" ;                               
  informat date_s date_e date_r mmddyy08. ma 8. t_x $8. ;
  format   date_s date_e date_r yymmdd10. ;              
  input (_all_) (:) ;                                    
  cards ;                                                
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
;                                                        
run ;                                                    
                                                         
data want (drop = _:) ;                                  
  set have ;                                             
  retain _t ;                                            
  if cmiss (t_x) then t_x = _t ;                         
  else _t = t_x ;                                        
run ;                                                    

However, if you may have a missing T_X anywhere in the DATE_: group and want to fill the whole group with whatever non-missing value is present in the group (if it's present):

data want ;                                                                   
  merge have (drop=t_x in=h1) have (keep=date_: t_x where=(t_x is not null)) ;
  by date_s descending date_e descending date_r ;                             
  if h1 ;                                                                     
run ;                                                                         

OTOH, if your input data set is not sorted or grouped by DATE_: and/or you don't want to rely on any kind of DATE_: order, the hash object comes in handy:

data want ;                                               
  if _n_ = 1 then do ;                                    
    dcl hash h (dataset:"have(where=(t_x is not null))") ;
    h.definekey ("date_s", "date_e", "date_r") ;          
    h.definedata ("t_x") ;                                
    h.definedone () ;                                     
  end ;                                                   
  set have ;                                              
  h.find() ;                                              
run ;                                                     

Kind regards

Paul D.

View solution in original post

3 REPLIES 3
Jagadishkatam
Amethyst | Level 16

Please try the below untested code

 

 

proc sort data=have;
by date_s date_e date_r ma descending t_x;
run;

data want;
set have(rename=(t_x=_t_x_));
by date_s date_e date_r ma descending t_x;
retain t_x;
if first.ma then t_x='';
if _t_x_ ne '' then t_x=_t_x_;
run;

Thanks,
Jag
hashman
Ammonite | Level 13

@LoPez_Diaz:

Assuming that T_X is always not missing in the first record of each DATE_: group, it doesn't get simpler than this: 

data have ;                                              
  infile cards dlm = "|" ;                               
  informat date_s date_e date_r mmddyy08. ma 8. t_x $8. ;
  format   date_s date_e date_r yymmdd10. ;              
  input (_all_) (:) ;                                    
  cards ;                                                
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
;                                                        
run ;                                                    
                                                         
data want (drop = _:) ;                                  
  set have ;                                             
  retain _t ;                                            
  if cmiss (t_x) then t_x = _t ;                         
  else _t = t_x ;                                        
run ;                                                    

However, if you may have a missing T_X anywhere in the DATE_: group and want to fill the whole group with whatever non-missing value is present in the group (if it's present):

data want ;                                                                   
  merge have (drop=t_x in=h1) have (keep=date_: t_x where=(t_x is not null)) ;
  by date_s descending date_e descending date_r ;                             
  if h1 ;                                                                     
run ;                                                                         

OTOH, if your input data set is not sorted or grouped by DATE_: and/or you don't want to rely on any kind of DATE_: order, the hash object comes in handy:

data want ;                                               
  if _n_ = 1 then do ;                                    
    dcl hash h (dataset:"have(where=(t_x is not null))") ;
    h.definekey ("date_s", "date_e", "date_r") ;          
    h.definedata ("t_x") ;                                
    h.definedone () ;                                     
  end ;                                                   
  set have ;                                              
  h.find() ;                                              
run ;                                                     

Kind regards

Paul D.

LoPez_Diaz
Obsidian | Level 7

Hello Sir, 

Thank you very much for your detailled response , I really appreciate your help .
Thank you !

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2530 views
  • 2 likes
  • 3 in conversation