BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LoPez_Diaz
Obsidian | Level 7

Hello , 

I have a huge dataset , with over 20k rows , and 12 columns 

I have some missing values for a column T_X

My dataset has the following format : 

 

DATE_S   | DATE_E  |  DATE_R   | MA  |       T_X         

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |          .  

01/02/19  | 01/08/19  | 01/02/34     |  2    |          .

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |          .

01/02/19  | 01/08/19  | 01/02/34     |  2    |          .

01/02/19  | 01/06/19  | 01/07/22     |  7    |     M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |     M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |          .

01/02/19  | 01/06/19  | 01/07/22     |  7    |     M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |          .

 

So what I want to do , is to replace the missing values in column T_X with the value from the same column having the same row values as other rows that have a value in the column T_X;

 

The data I want to get should look like this : 

 

DATE_S   | DATE_E  |  DATE_R   | MA  |       T_X         

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/08/19  | 01/02/34     |  2    |    M66501L

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

01/02/19  | 01/06/19  | 01/07/22     |  7    |    M12408P

 

Thank you in advance for your responses.

 

1 ACCEPTED SOLUTION

Accepted Solutions
hashman
Ammonite | Level 13

@LoPez_Diaz:

Assuming that T_X is always not missing in the first record of each DATE_: group, it doesn't get simpler than this: 

data have ;                                              
  infile cards dlm = "|" ;                               
  informat date_s date_e date_r mmddyy08. ma 8. t_x $8. ;
  format   date_s date_e date_r yymmdd10. ;              
  input (_all_) (:) ;                                    
  cards ;                                                
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
;                                                        
run ;                                                    
                                                         
data want (drop = _:) ;                                  
  set have ;                                             
  retain _t ;                                            
  if cmiss (t_x) then t_x = _t ;                         
  else _t = t_x ;                                        
run ;                                                    

However, if you may have a missing T_X anywhere in the DATE_: group and want to fill the whole group with whatever non-missing value is present in the group (if it's present):

data want ;                                                                   
  merge have (drop=t_x in=h1) have (keep=date_: t_x where=(t_x is not null)) ;
  by date_s descending date_e descending date_r ;                             
  if h1 ;                                                                     
run ;                                                                         

OTOH, if your input data set is not sorted or grouped by DATE_: and/or you don't want to rely on any kind of DATE_: order, the hash object comes in handy:

data want ;                                               
  if _n_ = 1 then do ;                                    
    dcl hash h (dataset:"have(where=(t_x is not null))") ;
    h.definekey ("date_s", "date_e", "date_r") ;          
    h.definedata ("t_x") ;                                
    h.definedone () ;                                     
  end ;                                                   
  set have ;                                              
  h.find() ;                                              
run ;                                                     

Kind regards

Paul D.

View solution in original post

3 REPLIES 3
Jagadishkatam
Amethyst | Level 16

Please try the below untested code

 

 

proc sort data=have;
by date_s date_e date_r ma descending t_x;
run;

data want;
set have(rename=(t_x=_t_x_));
by date_s date_e date_r ma descending t_x;
retain t_x;
if first.ma then t_x='';
if _t_x_ ne '' then t_x=_t_x_;
run;

Thanks,
Jag
hashman
Ammonite | Level 13

@LoPez_Diaz:

Assuming that T_X is always not missing in the first record of each DATE_: group, it doesn't get simpler than this: 

data have ;                                              
  infile cards dlm = "|" ;                               
  informat date_s date_e date_r mmddyy08. ma 8. t_x $8. ;
  format   date_s date_e date_r yymmdd10. ;              
  input (_all_) (:) ;                                    
  cards ;                                                
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
;                                                        
run ;                                                    
                                                         
data want (drop = _:) ;                                  
  set have ;                                             
  retain _t ;                                            
  if cmiss (t_x) then t_x = _t ;                         
  else _t = t_x ;                                        
run ;                                                    

However, if you may have a missing T_X anywhere in the DATE_: group and want to fill the whole group with whatever non-missing value is present in the group (if it's present):

data want ;                                                                   
  merge have (drop=t_x in=h1) have (keep=date_: t_x where=(t_x is not null)) ;
  by date_s descending date_e descending date_r ;                             
  if h1 ;                                                                     
run ;                                                                         

OTOH, if your input data set is not sorted or grouped by DATE_: and/or you don't want to rely on any kind of DATE_: order, the hash object comes in handy:

data want ;                                               
  if _n_ = 1 then do ;                                    
    dcl hash h (dataset:"have(where=(t_x is not null))") ;
    h.definekey ("date_s", "date_e", "date_r") ;          
    h.definedata ("t_x") ;                                
    h.definedone () ;                                     
  end ;                                                   
  set have ;                                              
  h.find() ;                                              
run ;                                                     

Kind regards

Paul D.

LoPez_Diaz
Obsidian | Level 7

Hello Sir, 

Thank you very much for your detailled response , I really appreciate your help .
Thank you !

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 2163 views
  • 2 likes
  • 3 in conversation