BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

I'm trying to propagate a value vertically, but only in case for every value of k there is at least one non-missing data. I inserted the proc sort to bring the non-missing values to the beginning. This is the example:

 

data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;
proc sort data=ds_test;
by k descending value;
run;

data d;
set ds_test;
retain _value;
if not missing(value) then _value=value;
else value=_value;
drop _value;
run;

With the code I wrote I don't get the desired result because in the case of k=3 value is always missing.

Furthermore, in the case of k=4 I would like the value of 150 to propagate before 200, according to the vertical order in which it occurs.

What I would like to achieve is this result:

 

data d;
input k value;
datalines;
1 100
1 100
1 100
2 100
2 200
2 200
3 .
3 .
4 150
4 150
4 200
;

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

This sounds like it is standard LOCF (last observation carry forward) within each ID, but with a look-ahead to find the first non-missing value for each ID.  I would use a double-DoW loop for this.  This reads through each ID by-group twice.  The first time to find the first non-missing value.  The second time to do the LOCF.  The code inside the LOCF DO loop is similar to yours.  Note you do not want a RETAIN statement for this solution, because the DATA step appropriately sets _value to missing at the top of the DATA step loop (i.e. at the beginning of each by-group).

 

data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;

data d;
  do until(last.k) ;
    set ds_test;
    by k ;
    if missing(_value) then _value=value ;
  end ;

  do until(last.k) ;
    set ds_test;
    by k ;

    if not missing(value) then _value=value;
    else value=_value;

    output ;
  end ;

  drop _: ;
run;

proc print ;
run ;
BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

8 REPLIES 8
sbxkoenk
SAS Super FREQ
data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;
run;

PROC EXPAND data=ds_test
            out=ds_test_out
            method=step;
run;

proc means data=ds_test nway noprint;
 CLASS k;
 VAR   value;
 output out=all_missing(drop=_:) sum= / autoname;
run;

data d(drop=time value_sum);
 merge ds_test_out all_missing;
 by k;
 if value_sum=. then value=.;
run;
/* end of program */


Koen

mariopellegrini
Pyrite | Level 9

Thanks for the solution but unfortunately I don't have the sas package which includes the proc expand

ballardw
Super User

Not sure why you sorted that data as the results aren't going to match your desired description.

 

Try:

/* DO not SORT DS_TEST*/
data d; set ds_test; by k notsorted; retain _value; if first.k then _value=value; if not missing(value) then _value=value; else value=_value; drop _value; run;

The BY statement means that SAS creates automatic variables first.(variablename) and last.(variablename) for each variable on the by statement. These are numeric 1/0 for true/false indicating that the current observation is the first or last of a combination of the variables.  So you can use that to reset the retained variable(s) when the group membership changes.

Then the data set is processed in order. The option NOTSORTED on the By statement allows the By variables to be treated as grouped but does not require them to be in sort order.

mariopellegrini
Pyrite | Level 9

Thanks for the solution. I would also like to include your post as a solution but I don't think it's possible to put more than one solution

Quentin
Super User

This sounds like it is standard LOCF (last observation carry forward) within each ID, but with a look-ahead to find the first non-missing value for each ID.  I would use a double-DoW loop for this.  This reads through each ID by-group twice.  The first time to find the first non-missing value.  The second time to do the LOCF.  The code inside the LOCF DO loop is similar to yours.  Note you do not want a RETAIN statement for this solution, because the DATA step appropriately sets _value to missing at the top of the DATA step loop (i.e. at the beginning of each by-group).

 

data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;

data d;
  do until(last.k) ;
    set ds_test;
    by k ;
    if missing(_value) then _value=value ;
  end ;

  do until(last.k) ;
    set ds_test;
    by k ;

    if not missing(value) then _value=value;
    else value=_value;

    output ;
  end ;

  drop _: ;
run;

proc print ;
run ;
BASUG is hosting free webinars Next up: Mark Keintz presenting History Carried Forward, Future Carried Back: Mixing Time Series of Differing Frequencies on May 8. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
data_null__
Jade | Level 19
data ds_test;
input k value @@;
datalines;
1 100 1 .   1 .   
2 .   2 200 2 .
3 .   3 .
4 150 4 . 4 200
5 .   5 . 5 150 5 . 5 200
;;;;

proc print;
   run;
options msglevel=I;
data want;
   update ds_test(obs=0) ds_test(in=in1);
   by k;
   merge ds_test(keep=k) ds_test(rename=(value=value0) where=(not missing(value0)));
   by k;
   if in1;
   value = coalesce(value,value0);
   output;
   drop value0;
   run;
proc print;
   run;

Capture.PNG

Astounding
PROC Star
For the first observation with k=2, did you mean for the result to be 100 (continued from k=1)?
It makes a difference in the list of possible solutions.
mkeintz
PROC Star

You don't really need to read each K-group twice, but you do need to know if the k-group in hand has at least one nonmissing VALUE.  In such a case, carry forward the most recent nonmissing VALUE.  If not then set the carry forward value to missing:

 

data want (drop=_:);
  merge ds_test (where=(value^=.) in=nonmissing_value_found)
        ds_test ;
  by k;

  if first.k and nonmissing_value_found=0 then call missing(_most_recent_value);
  retain _most_recent_value;
  value=coalesce(value,_most_recent_value);
  _most_recent_value=value;
run;

It's vital to this program that the two arguments of the MERGE has the non-missing subset listed first, and the entire set second, so that any each observation preserves all its values (values from rightmost common variable prevails).   The important element is generating the NONMISSING_VALUE_FOUND dummy for each K.

 

Then, if a given K-group has NO non-missing VALUEs, the IF ... statement sets the carry forward VALUE to missing.

 

BTW, this program assumes you want to carry-forward non-missing VALUEs across K-groups, as your sample shows for the first obs of K=2, per @Astounding's comment.

 

Also, your original data needs no further sorting.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 741 views
  • 6 likes
  • 7 in conversation