Solved: Fill a column missing values depending on other columns and rows cond...

LoPez_Diaz · Posted 08-30-2019 12:06 PM

Hello ,

I have a huge dataset , with over 20k rows , and 12 columns

I have some missing values for a column T_X

My dataset has the following format :

DATE_S | DATE_E | DATE_R | MA | T_X

01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L

01/02/19 | 01/08/19 | 01/02/34 | 2 | .

01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L

01/02/19 | 01/08/19 | 01/02/34 | 2 | .

01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P

01/02/19 | 01/06/19 | 01/07/22 | 7 | .

01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P

01/02/19 | 01/06/19 | 01/07/22 | 7 | .

So what I want to do , is to replace the missing values in column T_X with the value from the same column having the same row values as other rows that have a value in the column T_X;

The data I want to get should look like this :

DATE_S | DATE_E | DATE_R | MA | T_X

01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L

01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P

Thank you in advance for your responses.

hashman · Posted 08-30-2019 02:29 PM

@LoPez_Diaz:

Assuming that T_X is always not missing in the first record of each DATE_: group, it doesn't get simpler than this:

data have ;                                              
  infile cards dlm = "|" ;                               
  informat date_s date_e date_r mmddyy08. ma 8. t_x $8. ;
  format   date_s date_e date_r yymmdd10. ;              
  input (_all_) (:) ;                                    
  cards ;                                                
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
;                                                        
run ;                                                    
                                                         
data want (drop = _:) ;                                  
  set have ;                                             
  retain _t ;                                            
  if cmiss (t_x) then t_x = _t ;                         
  else _t = t_x ;                                        
run ;

However, if you may have a missing T_X anywhere in the DATE_: group and want to fill the whole group with whatever non-missing value is present in the group (if it's present):

data want ;                                                                   
  merge have (drop=t_x in=h1) have (keep=date_: t_x where=(t_x is not null)) ;
  by date_s descending date_e descending date_r ;                             
  if h1 ;                                                                     
run ;

OTOH, if your input data set is not sorted or grouped by DATE_: and/or you don't want to rely on any kind of DATE_: order, the hash object comes in handy:

data want ;                                               
  if _n_ = 1 then do ;                                    
    dcl hash h (dataset:"have(where=(t_x is not null))") ;
    h.definekey ("date_s", "date_e", "date_r") ;          
    h.definedata ("t_x") ;                                
    h.definedone () ;                                     
  end ;                                                   
  set have ;                                              
  h.find() ;                                              
run ;

Kind regards

Paul D.

View solution in original post

Jagadishkatam · Posted 08-30-2019 12:29 PM

Please try the below untested code

proc sort data=have;
by date_s date_e date_r ma descending t_x;
run;

data want;
set have(rename=(t_x=_t_x_));
by date_s date_e date_r ma descending t_x;
retain t_x;
if first.ma then t_x='';
if _t_x_ ne '' then t_x=_t_x_;
run;

Thanks,
Jag

hashman · Posted 08-30-2019 02:29 PM

@LoPez_Diaz:

Assuming that T_X is always not missing in the first record of each DATE_: group, it doesn't get simpler than this:

data have ;                                              
  infile cards dlm = "|" ;                               
  informat date_s date_e date_r mmddyy08. ma 8. t_x $8. ;
  format   date_s date_e date_r yymmdd10. ;              
  input (_all_) (:) ;                                    
  cards ;                                                
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L             
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/08/19 | 01/02/34 | 2 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P             
01/02/19 | 01/06/19 | 01/07/22 | 7 | .                   
;                                                        
run ;                                                    
                                                         
data want (drop = _:) ;                                  
  set have ;                                             
  retain _t ;                                            
  if cmiss (t_x) then t_x = _t ;                         
  else _t = t_x ;                                        
run ;

However, if you may have a missing T_X anywhere in the DATE_: group and want to fill the whole group with whatever non-missing value is present in the group (if it's present):

data want ;                                                                   
  merge have (drop=t_x in=h1) have (keep=date_: t_x where=(t_x is not null)) ;
  by date_s descending date_e descending date_r ;                             
  if h1 ;                                                                     
run ;

OTOH, if your input data set is not sorted or grouped by DATE_: and/or you don't want to rely on any kind of DATE_: order, the hash object comes in handy:

data want ;                                               
  if _n_ = 1 then do ;                                    
    dcl hash h (dataset:"have(where=(t_x is not null))") ;
    h.definekey ("date_s", "date_e", "date_r") ;          
    h.definedata ("t_x") ;                                
    h.definedone () ;                                     
  end ;                                                   
  set have ;                                              
  h.find() ;                                              
run ;

Kind regards

Paul D.

LoPez_Diaz · Posted 09-02-2019 08:11 AM

Hello Sir,

Thank you very much for your detailled response , I really appreciate your help .
Thank you !

Fill a column missing values depending on other columns and rows conditions

Re: Fill a column missing values depending on other columns and rows conditions

Re: Fill a column missing values depending on other columns and rows conditions

Re: Fill a column missing values depending on other columns and rows conditions

Re: Fill a column missing values depending on other columns and rows conditions

Fill a column missing values depending on other columns and rows conditions

Re: Fill a column missing values depending on other columns and rows conditions

Re: Fill a column missing values depending on other columns and rows conditions

Re: Fill a column missing values depending on other columns and rows conditions

Re: Fill a column missing values depending on other columns and rows conditions

Click image to register for webinar

Classroom Training Available!