Solved: Re: propagate value vertically

mariopellegrini · Posted 07-12-2023 03:58 AM

I'm trying to propagate a value vertically, but only in case for every value of k there is at least one non-missing data. I inserted the proc sort to bring the non-missing values to the beginning. This is the example:

data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;
proc sort data=ds_test;
by k descending value;
run;

data d;
set ds_test;
retain _value;
if not missing(value) then _value=value;
else value=_value;
drop _value;
run;

With the code I wrote I don't get the desired result because in the case of k=3 value is always missing.

Furthermore, in the case of k=4 I would like the value of 150 to propagate before 200, according to the vertical order in which it occurs.

What I would like to achieve is this result:

data d;
input k value;
datalines;
1 100
1 100
1 100
2 100
2 200
2 200
3 .
3 .
4 150
4 150
4 200
;

Quentin · Posted 07-12-2023 10:39 AM

This sounds like it is standard LOCF (last observation carry forward) within each ID, but with a look-ahead to find the first non-missing value for each ID. I would use a double-DoW loop for this. This reads through each ID by-group twice. The first time to find the first non-missing value. The second time to do the LOCF. The code inside the LOCF DO loop is similar to yours. Note you do not want a RETAIN statement for this solution, because the DATA step appropriately sets _value to missing at the top of the DATA step loop (i.e. at the beginning of each by-group).

data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;

data d;
  do until(last.k) ;
    set ds_test;
    by k ;
    if missing(_value) then _value=value ;
  end ;

  do until(last.k) ;
    set ds_test;
    by k ;

    if not missing(value) then _value=value;
    else value=_value;

    output ;
  end ;

  drop _: ;
run;

proc print ;
run ;

The Boston Area SAS Users Group is hosting free webinars!

Register now at https://www.basug.org/events.

View solution in original post

sbxkoenk · Posted 07-12-2023 09:16 AM

data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;
run;

PROC EXPAND data=ds_test
            out=ds_test_out
            method=step;
run;

proc means data=ds_test nway noprint;
 CLASS k;
 VAR   value;
 output out=all_missing(drop=_:) sum= / autoname;
run;

data d(drop=time value_sum);
 merge ds_test_out all_missing;
 by k;
 if value_sum=. then value=.;
run;
/* end of program */

Koen

mariopellegrini · Posted 07-12-2023 09:59 AM

Thanks for the solution but unfortunately I don't have the sas package which includes the proc expand

ballardw · Posted 07-12-2023 10:22 AM

Not sure why you sorted that data as the results aren't going to match your desired description.

Try:

/* DO not SORT DS_TEST*/
data d;
set ds_test;
by k notsorted;
retain _value;
if first.k then _value=value;
if not missing(value) then _value=value;
else value=_value;
drop _value;
run;

The BY statement means that SAS creates automatic variables first.(variablename) and last.(variablename) for each variable on the by statement. These are numeric 1/0 for true/false indicating that the current observation is the first or last of a combination of the variables. So you can use that to reset the retained variable(s) when the group membership changes.

Then the data set is processed in order. The option NOTSORTED on the By statement allows the By variables to be treated as grouped but does not require them to be in sort order.

mariopellegrini · Posted 07-13-2023 06:50 AM

Thanks for the solution. I would also like to include your post as a solution but I don't think it's possible to put more than one solution

Quentin · Posted 07-12-2023 10:39 AM

This sounds like it is standard LOCF (last observation carry forward) within each ID, but with a look-ahead to find the first non-missing value for each ID. I would use a double-DoW loop for this. This reads through each ID by-group twice. The first time to find the first non-missing value. The second time to do the LOCF. The code inside the LOCF DO loop is similar to yours. Note you do not want a RETAIN statement for this solution, because the DATA step appropriately sets _value to missing at the top of the DATA step loop (i.e. at the beginning of each by-group).

data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;

data d;
  do until(last.k) ;
    set ds_test;
    by k ;
    if missing(_value) then _value=value ;
  end ;

  do until(last.k) ;
    set ds_test;
    by k ;

    if not missing(value) then _value=value;
    else value=_value;

    output ;
  end ;

  drop _: ;
run;

proc print ;
run ;

The Boston Area SAS Users Group is hosting free webinars!

Register now at https://www.basug.org/events.

data_null__ · Posted 07-12-2023 01:09 PM

data ds_test;
input k value @@;
datalines;
1 100 1 .   1 .   
2 .   2 200 2 .
3 .   3 .
4 150 4 . 4 200
5 .   5 . 5 150 5 . 5 200
;;;;

proc print;
   run;
options msglevel=I;
data want;
   update ds_test(obs=0) ds_test(in=in1);
   by k;
   merge ds_test(keep=k) ds_test(rename=(value=value0) where=(not missing(value0)));
   by k;
   if in1;
   value = coalesce(value,value0);
   output;
   drop value0;
   run;
proc print;
   run;

Astounding · Posted 07-12-2023 01:46 PM

For the first observation with k=2, did you mean for the result to be 100 (continued from k=1)?
It makes a difference in the list of possible solutions.

mkeintz · Posted 07-12-2023 09:00 PM

You don't really need to read each K-group twice, but you do need to know if the k-group in hand has at least one nonmissing VALUE. In such a case, carry forward the most recent nonmissing VALUE. If not then set the carry forward value to missing:

data want (drop=_:);
  merge ds_test (where=(value^=.) in=nonmissing_value_found)
        ds_test ;
  by k;

  if first.k and nonmissing_value_found=0 then call missing(_most_recent_value);
  retain _most_recent_value;
  value=coalesce(value,_most_recent_value);
  _most_recent_value=value;
run;

It's vital to this program that the two arguments of the MERGE has the non-missing subset listed first, and the entire set second, so that ~~any~~ each observation preserves all its values (values from rightmost common variable prevails). The important element is generating the NONMISSING_VALUE_FOUND dummy for each K.

Then, if a given K-group has NO non-missing VALUEs, the IF ... statement sets the carry forward VALUE to missing.

BTW, this program assumes you want to carry-forward non-missing VALUEs across K-groups, as your sample shows for the first obs of K=2, per @Astounding's comment.

Also, your original data needs no further sorting.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away