- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to propagate a value vertically, but only in case for every value of k there is at least one non-missing data. I inserted the proc sort to bring the non-missing values to the beginning. This is the example:
data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;
proc sort data=ds_test;
by k descending value;
run;
data d;
set ds_test;
retain _value;
if not missing(value) then _value=value;
else value=_value;
drop _value;
run;
With the code I wrote I don't get the desired result because in the case of k=3 value is always missing.
Furthermore, in the case of k=4 I would like the value of 150 to propagate before 200, according to the vertical order in which it occurs.
What I would like to achieve is this result:
data d;
input k value;
datalines;
1 100
1 100
1 100
2 100
2 200
2 200
3 .
3 .
4 150
4 150
4 200
;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This sounds like it is standard LOCF (last observation carry forward) within each ID, but with a look-ahead to find the first non-missing value for each ID. I would use a double-DoW loop for this. This reads through each ID by-group twice. The first time to find the first non-missing value. The second time to do the LOCF. The code inside the LOCF DO loop is similar to yours. Note you do not want a RETAIN statement for this solution, because the DATA step appropriately sets _value to missing at the top of the DATA step loop (i.e. at the beginning of each by-group).
data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;
data d;
do until(last.k) ;
set ds_test;
by k ;
if missing(_value) then _value=value ;
end ;
do until(last.k) ;
set ds_test;
by k ;
if not missing(value) then _value=value;
else value=_value;
output ;
end ;
drop _: ;
run;
proc print ;
run ;
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;
run;
PROC EXPAND data=ds_test
out=ds_test_out
method=step;
run;
proc means data=ds_test nway noprint;
CLASS k;
VAR value;
output out=all_missing(drop=_:) sum= / autoname;
run;
data d(drop=time value_sum);
merge ds_test_out all_missing;
by k;
if value_sum=. then value=.;
run;
/* end of program */
Koen
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the solution but unfortunately I don't have the sas package which includes the proc expand
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure why you sorted that data as the results aren't going to match your desired description.
Try:
/* DO not SORT DS_TEST*/
data d; set ds_test; by k notsorted; retain _value; if first.k then _value=value; if not missing(value) then _value=value; else value=_value; drop _value; run;
The BY statement means that SAS creates automatic variables first.(variablename) and last.(variablename) for each variable on the by statement. These are numeric 1/0 for true/false indicating that the current observation is the first or last of a combination of the variables. So you can use that to reset the retained variable(s) when the group membership changes.
Then the data set is processed in order. The option NOTSORTED on the By statement allows the By variables to be treated as grouped but does not require them to be in sort order.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the solution. I would also like to include your post as a solution but I don't think it's possible to put more than one solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This sounds like it is standard LOCF (last observation carry forward) within each ID, but with a look-ahead to find the first non-missing value for each ID. I would use a double-DoW loop for this. This reads through each ID by-group twice. The first time to find the first non-missing value. The second time to do the LOCF. The code inside the LOCF DO loop is similar to yours. Note you do not want a RETAIN statement for this solution, because the DATA step appropriately sets _value to missing at the top of the DATA step loop (i.e. at the beginning of each by-group).
data ds_test;
input k value;
datalines;
1 100
1 .
1 .
2 .
2 200
2 .
3 .
3 .
4 150
4 .
4 200
;
data d;
do until(last.k) ;
set ds_test;
by k ;
if missing(_value) then _value=value ;
end ;
do until(last.k) ;
set ds_test;
by k ;
if not missing(value) then _value=value;
else value=_value;
output ;
end ;
drop _: ;
run;
proc print ;
run ;
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data ds_test;
input k value @@;
datalines;
1 100 1 . 1 .
2 . 2 200 2 .
3 . 3 .
4 150 4 . 4 200
5 . 5 . 5 150 5 . 5 200
;;;;
proc print;
run;
options msglevel=I;
data want;
update ds_test(obs=0) ds_test(in=in1);
by k;
merge ds_test(keep=k) ds_test(rename=(value=value0) where=(not missing(value0)));
by k;
if in1;
value = coalesce(value,value0);
output;
drop value0;
run;
proc print;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It makes a difference in the list of possible solutions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You don't really need to read each K-group twice, but you do need to know if the k-group in hand has at least one nonmissing VALUE. In such a case, carry forward the most recent nonmissing VALUE. If not then set the carry forward value to missing:
data want (drop=_:);
merge ds_test (where=(value^=.) in=nonmissing_value_found)
ds_test ;
by k;
if first.k and nonmissing_value_found=0 then call missing(_most_recent_value);
retain _most_recent_value;
value=coalesce(value,_most_recent_value);
_most_recent_value=value;
run;
It's vital to this program that the two arguments of the MERGE has the non-missing subset listed first, and the entire set second, so that any each observation preserves all its values (values from rightmost common variable prevails). The important element is generating the NONMISSING_VALUE_FOUND dummy for each K.
Then, if a given K-group has NO non-missing VALUEs, the IF ... statement sets the carry forward VALUE to missing.
BTW, this program assumes you want to carry-forward non-missing VALUEs across K-groups, as your sample shows for the first obs of K=2, per @Astounding's comment.
Also, your original data needs no further sorting.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------