Hello ,
I have a huge dataset , with over 20k rows , and 12 columns
I have some missing values for a column T_X
My dataset has the following format :
DATE_S | DATE_E | DATE_R | MA | T_X
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | .
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | .
So what I want to do , is to replace the missing values in column T_X with the value from the same column having the same row values as other rows that have a value in the column T_X;
The data I want to get should look like this :
DATE_S | DATE_E | DATE_R | MA | T_X
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
Thank you in advance for your responses.
Assuming that T_X is always not missing in the first record of each DATE_: group, it doesn't get simpler than this:
data have ;
infile cards dlm = "|" ;
informat date_s date_e date_r mmddyy08. ma 8. t_x $8. ;
format date_s date_e date_r yymmdd10. ;
input (_all_) (:) ;
cards ;
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | .
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | .
;
run ;
data want (drop = _:) ;
set have ;
retain _t ;
if cmiss (t_x) then t_x = _t ;
else _t = t_x ;
run ;
However, if you may have a missing T_X anywhere in the DATE_: group and want to fill the whole group with whatever non-missing value is present in the group (if it's present):
data want ;
merge have (drop=t_x in=h1) have (keep=date_: t_x where=(t_x is not null)) ;
by date_s descending date_e descending date_r ;
if h1 ;
run ;
OTOH, if your input data set is not sorted or grouped by DATE_: and/or you don't want to rely on any kind of DATE_: order, the hash object comes in handy:
data want ;
if _n_ = 1 then do ;
dcl hash h (dataset:"have(where=(t_x is not null))") ;
h.definekey ("date_s", "date_e", "date_r") ;
h.definedata ("t_x") ;
h.definedone () ;
end ;
set have ;
h.find() ;
run ;
Kind regards
Paul D.
Please try the below untested code
proc sort data=have;
by date_s date_e date_r ma descending t_x;
run;
data want;
set have(rename=(t_x=_t_x_));
by date_s date_e date_r ma descending t_x;
retain t_x;
if first.ma then t_x='';
if _t_x_ ne '' then t_x=_t_x_;
run;
Assuming that T_X is always not missing in the first record of each DATE_: group, it doesn't get simpler than this:
data have ;
infile cards dlm = "|" ;
informat date_s date_e date_r mmddyy08. ma 8. t_x $8. ;
format date_s date_e date_r yymmdd10. ;
input (_all_) (:) ;
cards ;
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | M66501L
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/08/19 | 01/02/34 | 2 | .
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | .
01/02/19 | 01/06/19 | 01/07/22 | 7 | M12408P
01/02/19 | 01/06/19 | 01/07/22 | 7 | .
;
run ;
data want (drop = _:) ;
set have ;
retain _t ;
if cmiss (t_x) then t_x = _t ;
else _t = t_x ;
run ;
However, if you may have a missing T_X anywhere in the DATE_: group and want to fill the whole group with whatever non-missing value is present in the group (if it's present):
data want ;
merge have (drop=t_x in=h1) have (keep=date_: t_x where=(t_x is not null)) ;
by date_s descending date_e descending date_r ;
if h1 ;
run ;
OTOH, if your input data set is not sorted or grouped by DATE_: and/or you don't want to rely on any kind of DATE_: order, the hash object comes in handy:
data want ;
if _n_ = 1 then do ;
dcl hash h (dataset:"have(where=(t_x is not null))") ;
h.definekey ("date_s", "date_e", "date_r") ;
h.definedata ("t_x") ;
h.definedone () ;
end ;
set have ;
h.find() ;
run ;
Kind regards
Paul D.
Hello Sir,
Thank you very much for your detailled response , I really appreciate your help .
Thank you !
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.