Dear community,
I have a problem and I hope you can help me. Image I have the following table
id Var1 Var2 Var3 Var4 Var5 Var6
1 10 . . 0.25 . .
1 10 . . 0.25 . .
1 . 11 . . 0.3 .
1 . 11 . . 0.3 .
1 . . 41 . . 0.03
1 . . 41 . . 0.03
1 . . 41 . . 0.03
2 46 . . 0.02 . .
3 28 . . 0.03 . .
3 . 29 . . 0.02
I want to replace the missing values (.) with the previous or next values by considering the id-variable. That means I want to have the following table
id Var1 Var2 Var3 Var4 Var5 Var6
1 10 11 41 0.25 0.3 0.03
1 10 11 41 0.25 0.3 0.03
1 10 11 41 0.25 0.3 0.03
1 10 11 41 0.25 0.3 0.03
1 10 11 41 0.25 0.3 0.03
1 10 11 41 0.25 0.3 0.03
1 10 11 41 0.25 0.3 0.03
2 46 11 41 0.02 . .
3 28 . 29 0.03 . 0.02
3 28 . 29 0.03 . 0.02
How I can I do this?
Thank you very much and kind regards
or use an UPDATE by ID to fill, and loop the filled for the count of occurrences of ID-
data have;
input id Var1 Var2 Var3 Var4 Var5 Var6;
cards;
1 10 . . 0.25 . .
1 10 . . 0.25 . .
1 . 11 . . 0.3 .
1 . 11 . . 0.3 .
1 . . 41 . . 0.03
1 . . 41 . . 0.03
1 . . 41 . . 0.03
2 46 . . 0.02 . .
3 28 . . 0.03 . .
3 . . 29 . . 0.02
;
data want ;
do _n_ = 1 by 1 until (last.id ) ;
update have(obs=0) have ;
by id ;
end ;
do _n_ = 1 to _n_ ;
output ;
end ;
run ;
From the data you show, a solution which uses the mean value in place of the missings ought to work, and is very simple to do using PROC STDIZE. Would that work for you?
or use an UPDATE by ID to fill, and loop the filled for the count of occurrences of ID-
data have;
input id Var1 Var2 Var3 Var4 Var5 Var6;
cards;
1 10 . . 0.25 . .
1 10 . . 0.25 . .
1 . 11 . . 0.3 .
1 . 11 . . 0.3 .
1 . . 41 . . 0.03
1 . . 41 . . 0.03
1 . . 41 . . 0.03
2 46 . . 0.02 . .
3 28 . . 0.03 . .
3 . . 29 . . 0.02
;
data want ;
do _n_ = 1 by 1 until (last.id ) ;
update have(obs=0) have ;
by id ;
end ;
do _n_ = 1 to _n_ ;
output ;
end ;
run ;
I like the concept of using UPDATE, followed by a do loop of SET/OUTPUT statements. But it can be cleaned a bit:
data want ;
update have(obs=0) have ;
by id ;
if last.id then do until (last.id);
set have (keep=id);
by id;
output;
end ;
run ;
I should add that this provides a way to specify a variable list in the "keep=" parameter that allows original values to prevail.
You example seems to result in all observations in a group being exactly the same.
What do you want to happen when the non-missing values are not all the same?
Is there another variable that can be used to insure that the observations are in the right order? A sequence number or date perhaps?
If your these variables have the same value within the same group ID .
proc sql;
create table want(drop=dummy) as
select id,var1 as dummy,
max(var1) as var1 ,
max(var2) as var2 ,
max(var3) as var3 ,
max(var4) as var4 ,
max(var5) as var5
from have
group by id;
quit;
Thank you very much for the suggested solutions.
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.