BookmarkSubscribeRSS Feed
DSAS_er
Calcite | Level 5

I'm using proc expand to calculate the 7 day midpoint average for a number of variables by date, and would like for it to stop calculating past the last raw value entered for each variable, where the last date of entry for variables are all different. This is my code: 

 

proc expand data=before out=after;

convert variable1 = variable1 _avg variable2=variable2_avg/ method=none transformout=(CMOVAVE 7);

run;

 

This is the output:

DateVariable1Variable1_avgVariable2Variable2_avg
10-Jan-21 2.33333355
11-Jan-2122.333333 5.2
12-Jan-2112.2575.2
13-Jan-21 1.66666755.2
14-Jan-21 2.2565.25
15-Jan-2122.333333 4.6
16-Jan-21 334.2
17-Jan-2143 4
18-Jan-21 323.333333
19-Jan-21 453.333333
20-Jan-21 4 3.5
21-Jan-21   3.5
22-Jan-21   5
23-Jan-21    
24-Jan-21    

And I would like it to stop computing past the date of the last raw values, i.e. exclude the bolded values. Would I able to do this by modifying the proc expand statement? Or an array/another method for the large number of variables?

 

EDIT: for clarification, there were no data entered for variable1 after January 17 and for variable2  after January 19.

 

Thanks!

6 REPLIES 6
Oligolas
Barite | Level 11

Hi,

 

this is an amazing procedure. Unfortunately I do not have SAS/ETS to test and use it.

I do not know if the procedure is able to proceed your request, but merging back the original values to your output dataset may be a suitable workaround

________________________

- Cheers -

ballardw
Super User

The basic approach would likely be to filter the resulting data and not try to get Proc Expand not to do what it is designed to do.

 

It is not obvious what values of what variable(s) would be used to determine the filter though.

And I am not sure if you want to have the bolded values set to missing or all the data from 18JAN and later removed.

 

BTW, I recommend using dates with 4 years for clarity.

DSAS_er
Calcite | Level 5

Thanks for responding. The dataset is merely an example, so assuming that all date are in 2021, by day, and sorted.

 

Yes, I would like the bolded values set to missing as of the subsequent row for the last raw observation.

ballardw
Super User

Define "last raw observation".

There is no way for use to tell what that is.

DSAS_er
Calcite | Level 5

Oh, as in there are no subsequent values following that date, so the last date of data for variable1 is January 17 and for variable2 January 19, nothing more since. 

ballardw
Super User

@DSAS_er wrote:

Oh, as in there are no subsequent values following that date, so the last date of data for variable1 is January 17 and for variable2 January 19, nothing more since. 


Finally something that almost resembles are rule.

So you use the OUTPUT from Proc expand:

Data want;
   set after;
   if date > '17JAN2021'd then call missing(Variable1_avg);
   if date > '19JAN2021'd then call missing(Variable2_avg);
run;

If you expect to do something like this "automagically" you would need to process the input data set for proc expand, find the last valid row and find a way to provide those dates to the data step. This could involve macro language, summarizing the data to get data and merge it with the "after" data or possibly hash objects. There are several ways depending on complexity of data.

 

If the data is ugly enough and not too long it may be easiest to just look at the results in the AFTER and write a data step like I did.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 613 views
  • 0 likes
  • 3 in conversation