Hello ,
I have a huge dataset , with over 20k rows , and 12 columns
I have some missing values for a column T_X
My dataset has the following format :
DATE_S | DATE_E | DATE_R | MA | T_X
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | .
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | .
So what I want to do , is to replace the missing values in column T_X with the value from the same column having the same row values as other rows that have a value in the column T_X;
The data I want to get should look like this :
DATE_S | DATE_E | DATE_R | MA | T_X
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
Thank you in advance for your responses.
Assuming that T_X is always not missing in the first record of each DATE_: group, it doesn't get simpler than this:
data have ;
infile cards dlm = "|" ;
informat date_s date_e date_r mmddyy08. ma 8. t_x $8. ;
format date_s date_e date_r yymmdd10. ;
input (_all_) (:) ;
cards ;
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | .
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | .
;
run ;
data want (drop = _:) ;
set have ;
retain _t ;
if cmiss (t_x) then t_x = _t ;
else _t = t_x ;
run ;
However, if you may have a missing T_X anywhere in the DATE_: group and want to fill the whole group with whatever non-missing value is present in the group (if it's present):
data want ;
merge have (drop=t_x in=h1) have (keep=date_: t_x where=(t_x is not null)) ;
by date_s descending date_e descending date_r ;
if h1 ;
run ;
OTOH, if your input data set is not sorted or grouped by DATE_: and/or you don't want to rely on any kind of DATE_: order, the hash object comes in handy:
data want ;
if _n_ = 1 then do ;
dcl hash h (dataset:"have(where=(t_x is not null))") ;
h.definekey ("date_s", "date_e", "date_r") ;
h.definedata ("t_x") ;
h.definedone () ;
end ;
set have ;
h.find() ;
run ;
Kind regards
Paul D.
Please try the below untested code
proc sort data=have;
by date_s date_e date_r ma descending t_x;
run;
data want;
set have(rename=(t_x=_t_x_));
by date_s date_e date_r ma descending t_x;
retain t_x;
if first.ma then t_x='';
if _t_x_ ne '' then t_x=_t_x_;
run;
Assuming that T_X is always not missing in the first record of each DATE_: group, it doesn't get simpler than this:
data have ;
infile cards dlm = "|" ;
informat date_s date_e date_r mmddyy08. ma 8. t_x $8. ;
format date_s date_e date_r yymmdd10. ;
input (_all_) (:) ;
cards ;
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | .
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | .
;
run ;
data want (drop = _:) ;
set have ;
retain _t ;
if cmiss (t_x) then t_x = _t ;
else _t = t_x ;
run ;
However, if you may have a missing T_X anywhere in the DATE_: group and want to fill the whole group with whatever non-missing value is present in the group (if it's present):
data want ;
merge have (drop=t_x in=h1) have (keep=date_: t_x where=(t_x is not null)) ;
by date_s descending date_e descending date_r ;
if h1 ;
run ;
OTOH, if your input data set is not sorted or grouped by DATE_: and/or you don't want to rely on any kind of DATE_: order, the hash object comes in handy:
data want ;
if _n_ = 1 then do ;
dcl hash h (dataset:"have(where=(t_x is not null))") ;
h.definekey ("date_s", "date_e", "date_r") ;
h.definedata ("t_x") ;
h.definedone () ;
end ;
set have ;
h.find() ;
run ;
Kind regards
Paul D.
Hello Sir,
Thank you very much for your detailled response , I really appreciate your help .
Thank you !
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.