I want to use LOCF to impute missing values, but the missing visits are not present in the dataset.
In this case, how do I check the missing visits and set the value to missing value first, and then use retain to do LOCF?
data visit;
input ID $ avisitn result;
datalines;
A 0 4.1
A 1 3.9
A 2 4.3
A 4 5.0
A 8 4.5
A 12 5.1
A 16 5.3
A 18 4.9
A 20 4.7
B 0 4.2
B 1 3.7
B 2 4.4
B 4 5.2
B 12 4.5
B 16 5.2
B 18 4.8
B 20 4.9
C 0 4.6
C 1 4.9
C 2 5.3
C 8 5.5
C 16 5.6
C 18 5.9
C 20 5.7
;
run;
/* avisitn is 0, 1, 2, 4, 8, 12, 16, 18, 20* /
/* A has all visits, B misses avisitn 8, C misses avisit 4 & 12 */
I think this takes care of it. The `PROC FREQ` with the `SPARSE` option expands the table to include the missing `avisitn` records.
The rest just is some PDV manipulation.
proc freq
data = visit noprint;
tables id * avisitn / out = visit_1 (drop = percent count) sparse;
run;
data visit_2;
merge
visit
visit_1;
by id avisitn;
retain result2;
if first.id then result2 = .;
if result ~= . then result2 = result;
run;
ID avisitn result result2 A 0 4.1 4.1 A 1 3.9 3.9 A 2 4.3 4.3 A 4 5.0 5.0 A 8 4.5 4.5 A 12 5.1 5.1 A 16 5.3 5.3 A 18 4.9 4.9 A 20 4.7 4.7 B 0 4.2 4.2 B 1 3.7 3.7 B 2 4.4 4.4 B 4 5.2 5.2 B 8 . 5.2 B 12 4.5 4.5 B 16 5.2 5.2 B 18 4.8 4.8 B 20 4.9 4.9 C 0 4.6 4.6 C 1 4.9 4.9 C 2 5.3 5.3 C 4 . 5.3 C 8 5.5 5.5 C 12 . 5.5 C 16 5.6 5.6 C 18 5.9 5.9 C 20 5.7 5.7
Build up a dataset with every visit for every subject.
proc sql noprint;
create table skeleton as
select id,avisitn
from (select distinct id from visit)
, (select distinct avisitn from visit)
order by id,avisitn
;
quit;
data full ;
merge skeleton visit (in=in1);
by id avisitn ;
realdata=in1;
run;
Then you can use LOCF technique to replace the missing values.
data locf ;
update full(obs=0) full;
by id;
output;
run;
Results:
Obs ID avisitn result realdata 1 A 0 4.1 1 2 A 1 3.9 1 3 A 2 4.3 1 4 A 4 5.0 1 5 A 8 4.5 1 6 A 12 5.1 1 7 A 16 5.3 1 8 A 18 4.9 1 9 A 20 4.7 1 10 B 0 4.2 1 11 B 1 3.7 1 12 B 2 4.4 1 13 B 4 5.2 1 14 B 8 5.2 0 15 B 12 4.5 1 16 B 16 5.2 1 17 B 18 4.8 1 18 B 20 4.9 1 19 C 0 4.6 1 20 C 1 4.9 1 21 C 2 5.3 1 22 C 4 5.3 0 23 C 8 5.5 1 24 C 12 5.5 0 25 C 16 5.6 1 26 C 18 5.9 1 27 C 20 5.7 1
Thank you very much, Maguiremg and Tom!
I learned from both of you. very nice ideas.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.