I want to use LOCF to impute missing values, but the missing visits are not present in the dataset.
In this case, how do I check the missing visits and set the value to missing value first, and then use retain to do LOCF?
data visit;
input ID $ avisitn result;
datalines;
A 0 4.1
A 1 3.9
A 2 4.3
A 4 5.0
A 8 4.5
A 12 5.1
A 16 5.3
A 18 4.9
A 20 4.7
B 0 4.2
B 1 3.7
B 2 4.4
B 4 5.2
B 12 4.5
B 16 5.2
B 18 4.8
B 20 4.9
C 0 4.6
C 1 4.9
C 2 5.3
C 8 5.5
C 16 5.6
C 18 5.9
C 20 5.7
;
run;
/* avisitn is 0, 1, 2, 4, 8, 12, 16, 18, 20* /
/* A has all visits, B misses avisitn 8, C misses avisit 4 & 12 */
I think this takes care of it. The `PROC FREQ` with the `SPARSE` option expands the table to include the missing `avisitn` records.
The rest just is some PDV manipulation.
proc freq
data = visit noprint;
tables id * avisitn / out = visit_1 (drop = percent count) sparse;
run;
data visit_2;
merge
visit
visit_1;
by id avisitn;
retain result2;
if first.id then result2 = .;
if result ~= . then result2 = result;
run;
ID avisitn result result2 A 0 4.1 4.1 A 1 3.9 3.9 A 2 4.3 4.3 A 4 5.0 5.0 A 8 4.5 4.5 A 12 5.1 5.1 A 16 5.3 5.3 A 18 4.9 4.9 A 20 4.7 4.7 B 0 4.2 4.2 B 1 3.7 3.7 B 2 4.4 4.4 B 4 5.2 5.2 B 8 . 5.2 B 12 4.5 4.5 B 16 5.2 5.2 B 18 4.8 4.8 B 20 4.9 4.9 C 0 4.6 4.6 C 1 4.9 4.9 C 2 5.3 5.3 C 4 . 5.3 C 8 5.5 5.5 C 12 . 5.5 C 16 5.6 5.6 C 18 5.9 5.9 C 20 5.7 5.7
Build up a dataset with every visit for every subject.
proc sql noprint;
create table skeleton as
select id,avisitn
from (select distinct id from visit)
, (select distinct avisitn from visit)
order by id,avisitn
;
quit;
data full ;
merge skeleton visit (in=in1);
by id avisitn ;
realdata=in1;
run;
Then you can use LOCF technique to replace the missing values.
data locf ;
update full(obs=0) full;
by id;
output;
run;
Results:
Obs ID avisitn result realdata 1 A 0 4.1 1 2 A 1 3.9 1 3 A 2 4.3 1 4 A 4 5.0 1 5 A 8 4.5 1 6 A 12 5.1 1 7 A 16 5.3 1 8 A 18 4.9 1 9 A 20 4.7 1 10 B 0 4.2 1 11 B 1 3.7 1 12 B 2 4.4 1 13 B 4 5.2 1 14 B 8 5.2 0 15 B 12 4.5 1 16 B 16 5.2 1 17 B 18 4.8 1 18 B 20 4.9 1 19 C 0 4.6 1 20 C 1 4.9 1 21 C 2 5.3 1 22 C 4 5.3 0 23 C 8 5.5 1 24 C 12 5.5 0 25 C 16 5.6 1 26 C 18 5.9 1 27 C 20 5.7 1
Thank you very much, Maguiremg and Tom!
I learned from both of you. very nice ideas.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.