Hi everyone
I recently posted a question on filling cells with values from cells above
this was kindly answered by both @Tom and @Jagadishkatam
I am reposting the question with a change...
I have a dataset with a column containing missing values.
I am trying to fill these missing values from the cells above provided that the two rows belong to the same group.
Here is the dataset:
data a;
input N Group Var;
datalines;
1 1 3
2 1 6
3 1 2
4 1 .
5 2 .
6 2 6
7 3 3
8 3 .
9 3 5
10 4 .
11 3 .
;
run;
The solusion suggested is
data want ;
set a ;
by group ;
if first.group then want=var;
else want=coalesce(var,want);
retain want;
run;
The problem is that this solution assumes that you can sort the database by group so that you can use the Data...Set...By and then first.group.
i can not sort the data by group because it will mess the order of the rows, and the whole idea of filling the cells with valuse above is based on the current order.
So, is there any other way of achieving the same thing, ie filling an empty cell with the value above it provided that they both belong to the same group
Thanks
Since your GROUP values contain integers, it should be possible to set up a temporary array holding the most recent VAR value for each group:
data want;
array groups {100} _temporary_;
set have;
if var=. then var = groups{group};
else groups{group}=var;
run;
This solution does assume that you have no more than 100 groups (that part is easily adjustable) and that GROUP is always populated. That last part can be worked around pretty easily as well.
Why not sort by GROUP, use the logic that you know works, then sort again by N to put the observations back in their original order?
Sorting by GROUP does not change the order of the observations within each GROUP.
If the data is not sorted by GROUP then why are you using the BY statement?
If you want to treat the data as if it was sorted by GROUP, without actually sorting it by GROUP, then you could use the NOTSORTED option on the BY statement. Then SAS won't complain if the values of GROUP are not monotonically increasing but it will still set the FIRST.GROUP flag varaible whenever GROUP changes.
Here is some data that shows the difference better than your example.
data a;
N+1;
input Group Var Expected;
datalines;
1 3 3
1 . 3
1 2 2
1 . 2
2 . .
2 6 6
2 . 6
1 . .
1 5 5
;
data want ;
set a ;
by group notsorted;
if first.group then want=var;
else want=coalesce(var,want);
retain want;
run;
Note that this is not going to generate the same result as you would get if you sorted by GROUP and N , calculated the new varaible, and then sorted again by N to put the data back into the original order.
For example in the last group of data above the first observation has a missing value for VAR so the new WANT variable is missing also since this is the first for the block. But if you first sorted by GROUP then this record would get 2 instead since the earlier block of data with GROUP=1 had VAR=2 as the last non-missing value.
Since your GROUP values contain integers, it should be possible to set up a temporary array holding the most recent VAR value for each group:
data want;
array groups {100} _temporary_;
set have;
if var=. then var = groups{group};
else groups{group}=var;
run;
This solution does assume that you have no more than 100 groups (that part is easily adjustable) and that GROUP is always populated. That last part can be worked around pretty easily as well.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.