I'm trying to propagate a value vertically, but only in case for every value of k there is at least one non-missing data. I inserted the proc sort to bring the non-missing values to the beginning. This is the example:
data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;
proc sort data=ds_test;
by k descending value;
run;
data d;
set ds_test;
retain _value;
if not missing(value) then _value=value;
else value=_value;
drop _value;
run;
With the code I wrote I don't get the desired result because in the case of k=3 value is always missing.
Furthermore, in the case of k=4 I would like the value of 150 to propagate before 200, according to the vertical order in which it occurs.
What I would like to achieve is this result:
data d;
input k value;
datalines;
1 100
1 100
1 100
2 100
2 200
2 200
3 .
3 .
4 150
4 150
4 200
;
This sounds like it is standard LOCF (last observation carry forward) within each ID, but with a look-ahead to find the first non-missing value for each ID. I would use a double-DoW loop for this. This reads through each ID by-group twice. The first time to find the first non-missing value. The second time to do the LOCF. The code inside the LOCF DO loop is similar to yours. Note you do not want a RETAIN statement for this solution, because the DATA step appropriately sets _value to missing at the top of the DATA step loop (i.e. at the beginning of each by-group).
data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;
data d;
do until(last.k) ;
set ds_test;
by k ;
if missing(_value) then _value=value ;
end ;
do until(last.k) ;
set ds_test;
by k ;
if not missing(value) then _value=value;
else value=_value;
output ;
end ;
drop _: ;
run;
proc print ;
run ;
data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;
run;
PROC EXPAND data=ds_test
out=ds_test_out
method=step;
run;
proc means data=ds_test nway noprint;
CLASS k;
VAR value;
output out=all_missing(drop=_:) sum= / autoname;
run;
data d(drop=time value_sum);
merge ds_test_out all_missing;
by k;
if value_sum=. then value=.;
run;
/* end of program */
Koen
Thanks for the solution but unfortunately I don't have the sas package which includes the proc expand
Not sure why you sorted that data as the results aren't going to match your desired description.
Try:
/* DO not SORT DS_TEST*/
data d; set ds_test; by k notsorted; retain _value; if first.k then _value=value; if not missing(value) then _value=value; else value=_value; drop _value; run;
The BY statement means that SAS creates automatic variables first.(variablename) and last.(variablename) for each variable on the by statement. These are numeric 1/0 for true/false indicating that the current observation is the first or last of a combination of the variables. So you can use that to reset the retained variable(s) when the group membership changes.
Then the data set is processed in order. The option NOTSORTED on the By statement allows the By variables to be treated as grouped but does not require them to be in sort order.
Thanks for the solution. I would also like to include your post as a solution but I don't think it's possible to put more than one solution
This sounds like it is standard LOCF (last observation carry forward) within each ID, but with a look-ahead to find the first non-missing value for each ID. I would use a double-DoW loop for this. This reads through each ID by-group twice. The first time to find the first non-missing value. The second time to do the LOCF. The code inside the LOCF DO loop is similar to yours. Note you do not want a RETAIN statement for this solution, because the DATA step appropriately sets _value to missing at the top of the DATA step loop (i.e. at the beginning of each by-group).
data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;
data d;
do until(last.k) ;
set ds_test;
by k ;
if missing(_value) then _value=value ;
end ;
do until(last.k) ;
set ds_test;
by k ;
if not missing(value) then _value=value;
else value=_value;
output ;
end ;
drop _: ;
run;
proc print ;
run ;
data ds_test;
input k value @@;
datalines;
1 100 1 . 1 .
2 . 2 200 2 .
3 . 3 .
4 150 4 . 4 200
5 . 5 . 5 150 5 . 5 200
;;;;
proc print;
run;
options msglevel=I;
data want;
update ds_test(obs=0) ds_test(in=in1);
by k;
merge ds_test(keep=k) ds_test(rename=(value=value0) where=(not missing(value0)));
by k;
if in1;
value = coalesce(value,value0);
output;
drop value0;
run;
proc print;
run;
You don't really need to read each K-group twice, but you do need to know if the k-group in hand has at least one nonmissing VALUE. In such a case, carry forward the most recent nonmissing VALUE. If not then set the carry forward value to missing:
data want (drop=_:);
merge ds_test (where=(value^=.) in=nonmissing_value_found)
ds_test ;
by k;
if first.k and nonmissing_value_found=0 then call missing(_most_recent_value);
retain _most_recent_value;
value=coalesce(value,_most_recent_value);
_most_recent_value=value;
run;
It's vital to this program that the two arguments of the MERGE has the non-missing subset listed first, and the entire set second, so that any each observation preserves all its values (values from rightmost common variable prevails). The important element is generating the NONMISSING_VALUE_FOUND dummy for each K.
Then, if a given K-group has NO non-missing VALUEs, the IF ... statement sets the carry forward VALUE to missing.
BTW, this program assumes you want to carry-forward non-missing VALUEs across K-groups, as your sample shows for the first obs of K=2, per @Astounding's comment.
Also, your original data needs no further sorting.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.