Hi guys,
suppose to have the following dataset:
data DB;
input ID :$200. Event :$200. Date :date09.;
format Date date9.;
cards;
0001 Event1 01SEP2024
0001 Event1 01SEP2024
0001 . 01SEP2024
0001 Event1 .
0002 Measure 08DEC2025
0002 Measure 08DEC2025
0002 . 08DEC2025
0002 Measure .
run;
For some reasons There are missing values. Is there a way to uniform values by ID?
Desired output:
data DB1;
input ID :$200. Event :$200. Date :date09.;
format Date date9.;
cards;
0001 Event1 01SEP2024
0001 Event1 01SEP2024
0001 Event1 01SEP2024
0001 Event1 01SEP2024
0002 Measure 08DEC2025
0002 Measure 08DEC2025
0002 Measure 08DEC2025
0002 Measure 08DEC2025
run;
The two variables are not related and missing values come from previous computations.
Additionally there are no multiple variable values for each ID.
Thank you in advance.
Use RETAIN statement.
(does not work if missing values occur in the starting rows for an ID)
data DB;
input ID :$200. Event :$200. Date :date09.;
format Date date9.;
cards;
0001 Event1 01SEP2024
0001 Event1 01SEP2024
0001 . 01SEP2024
0001 Event1 .
0002 Measure 08DEC2025
0002 Measure 08DEC2025
0002 . 08DEC2025
0002 Measure .
run;
data DB_want;
LENGTH Event_retain $ 200 Date_retain 8;
set DB;
retain Event_retain Date_retain;
by ID;
if first.ID then do; Event_retain=''; Date_retain=.; end;
if Event NE '' then Event_retain = Event;
if Date NE . then Date_retain = Date;
if Event EQ '' then Event = Event_retain;
if Date EQ . then Date = Date_retain;
DROP Event_retain Date_retain;
run;
/* end of program */
Koen
If you are asking the missing values be replaced by the previous value from the same ID then you can use the UPDATE statement to handle this.
The UPDATE statement is intended to apply transactions to the current data, so missing values leave the current value unchanged.
But by using an empty version of the dataset as the base dataset and using all of the values as transactions you can implement Last Observation Carried Forward. Just add an OUTPUT statement to force the data step to write all of the observations instead of just the last per BY group.
data want;
update have(obs=0) have;
by id;
output;
run;
If there are other variables you did not want to have the missing values replaced then you can read them in again by adding a SET statement using the KEEP= (or DROP=) dataset option. For example if you only want the EVENT and DATE variables to have the LOCF applied then you could do this:
data want;
update have(obs=0) have;
by id;
set have(drop=event date);
output;
run;
Base on your data, another working solution is :
data DB;
input ID :$200. Event :$200. Date :date09.;
format Date date9.;
cards;
0001 Event1 01SEP2024
0001 Event1 01SEP2024
0001 . 01SEP2024
0001 Event1 .
0002 Measure 08DEC2025
0002 Measure 08DEC2025
0002 . 08DEC2025
0002 Measure .
;
proc sql;
create table want(drop=_Event _Date) as
select *,max(_Event) as Event,max(_Date) as Date format=date9.
from DB(rename=(Event=_Event Date=_Date))
group by ID;
quit;
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.