I'm using proc expand to calculate the 7 day midpoint average for a number of variables by date, and would like for it to stop calculating past the last raw value entered for each variable, where the last date of entry for variables are all different. This is my code:
proc expand data=before out=after;
convert variable1 = variable1 _avg variable2=variable2_avg/ method=none transformout=(CMOVAVE 7);
run;
This is the output:
Date | Variable1 | Variable1_avg | Variable2 | Variable2_avg |
… | … | … | … | … |
10-Jan-21 | 2.333333 | 5 | 5 | |
11-Jan-21 | 2 | 2.333333 | 5.2 | |
12-Jan-21 | 1 | 2.25 | 7 | 5.2 |
13-Jan-21 | 1.666667 | 5 | 5.2 | |
14-Jan-21 | 2.25 | 6 | 5.25 | |
15-Jan-21 | 2 | 2.333333 | 4.6 | |
16-Jan-21 | 3 | 3 | 4.2 | |
17-Jan-21 | 4 | 3 | 4 | |
18-Jan-21 | 3 | 2 | 3.333333 | |
19-Jan-21 | 4 | 5 | 3.333333 | |
20-Jan-21 | 4 | 3.5 | ||
21-Jan-21 | 3.5 | |||
22-Jan-21 | 5 | |||
23-Jan-21 | ||||
24-Jan-21 |
And I would like it to stop computing past the date of the last raw values, i.e. exclude the bolded values. Would I able to do this by modifying the proc expand statement? Or an array/another method for the large number of variables?
EDIT: for clarification, there were no data entered for variable1 after January 17 and for variable2 after January 19.
Thanks!
Hi,
this is an amazing procedure. Unfortunately I do not have SAS/ETS to test and use it.
I do not know if the procedure is able to proceed your request, but merging back the original values to your output dataset may be a suitable workaround
- Cheers -
The basic approach would likely be to filter the resulting data and not try to get Proc Expand not to do what it is designed to do.
It is not obvious what values of what variable(s) would be used to determine the filter though.
And I am not sure if you want to have the bolded values set to missing or all the data from 18JAN and later removed.
BTW, I recommend using dates with 4 years for clarity.
Thanks for responding. The dataset is merely an example, so assuming that all date are in 2021, by day, and sorted.
Yes, I would like the bolded values set to missing as of the subsequent row for the last raw observation.
Define "last raw observation".
There is no way for use to tell what that is.
Oh, as in there are no subsequent values following that date, so the last date of data for variable1 is January 17 and for variable2 January 19, nothing more since.
@DSAS_er wrote:
Oh, as in there are no subsequent values following that date, so the last date of data for variable1 is January 17 and for variable2 January 19, nothing more since.
Finally something that almost resembles are rule.
So you use the OUTPUT from Proc expand:
Data want; set after; if date > '17JAN2021'd then call missing(Variable1_avg); if date > '19JAN2021'd then call missing(Variable2_avg); run;
If you expect to do something like this "automagically" you would need to process the input data set for proc expand, find the last valid row and find a way to provide those dates to the data step. This could involve macro language, summarizing the data to get data and merge it with the "after" data or possibly hash objects. There are several ways depending on complexity of data.
If the data is ugly enough and not too long it may be easiest to just look at the results in the AFTER and write a data step like I did.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.