Re: Stop computing last few values generated by Proc expand

DSAS_er · Posted 05-21-2021 08:04 AM

I'm using proc expand to calculate the 7 day midpoint average for a number of variables by date, and would like for it to stop calculating past the last raw value entered for each variable, where the last date of entry for variables are all different. This is my code:

proc expand data=before out=after;

convert variable1 = variable1 _avg variable2=variable2_avg/ method=none transformout=(CMOVAVE 7);

run;

This is the output:

Date	Variable1	Variable1_avg	Variable2	Variable2_avg
…	…	…	…	…
10-Jan-21		2.333333	5	5
11-Jan-21	2	2.333333		5.2
12-Jan-21	1	2.25	7	5.2
13-Jan-21		1.666667	5	5.2
14-Jan-21		2.25	6	5.25
15-Jan-21	2	2.333333		4.6
16-Jan-21		3	3	4.2
17-Jan-21	4	3		4
18-Jan-21		3	2	3.333333
19-Jan-21		4	5	3.333333
20-Jan-21		4		3.5
21-Jan-21				3.5
22-Jan-21				5
23-Jan-21
24-Jan-21

And I would like it to stop computing past the date of the last raw values, i.e. exclude the bolded values. Would I able to do this by modifying the proc expand statement? Or an array/another method for the large number of variables?

EDIT: for clarification, there were no data entered for variable1 after January 17 and for variable2 after January 19.

Thanks!

Oligolas · Posted 05-21-2021 09:15 AM

Hi,

this is an amazing procedure. Unfortunately I do not have SAS/ETS to test and use it.

I do not know if the procedure is able to proceed your request, but merging back the original values to your output dataset may be a suitable workaround

________________________
- Cheers -

ballardw · Posted 05-21-2021 10:45 AM

The basic approach would likely be to filter the resulting data and not try to get Proc Expand not to do what it is designed to do.

It is not obvious what values of what variable(s) would be used to determine the filter though.

And I am not sure if you want to have the bolded values set to missing or all the data from 18JAN and later removed.

BTW, I recommend using dates with 4 years for clarity.

DSAS_er · Posted 05-21-2021 11:22 AM

Thanks for responding. The dataset is merely an example, so assuming that all date are in 2021, by day, and sorted.

Yes, I would like the bolded values set to missing as of the subsequent row for the last raw observation.

ballardw · Posted 05-21-2021 01:47 PM

Define "last raw observation".

There is no way for use to tell what that is.

DSAS_er · Posted 05-21-2021 02:59 PM

Oh, as in there are no subsequent values following that date, so the last date of data for variable1 is January 17 and for variable2 January 19, nothing more since.

ballardw · Posted 05-21-2021 06:30 PM

@DSAS_er wrote:

Oh, as in there are no subsequent values following that date, so the last date of data for variable1 is January 17 and for variable2 January 19, nothing more since.

Finally something that almost resembles are rule.

So you use the OUTPUT from Proc expand:

Data want;
   set after;
   if date > '17JAN2021'd then call missing(Variable1_avg);
   if date > '19JAN2021'd then call missing(Variable2_avg);
run;

If you expect to do something like this "automagically" you would need to process the input data set for proc expand, find the last valid row and find a way to provide those dates to the data step. This could involve macro language, summarizing the data to get data and merge it with the "after" data or possibly hash objects. There are several ways depending on complexity of data.

If the data is ugly enough and not too long it may be easiest to just look at the results in the AFTER and write a data step like I did.

Stop computing last few values generated by Proc expand