BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
braam
Quartz | Level 8

I would like to calculate standard deviation of a variable for the recent five years. Proc Expand works pretty well, but not in a perfect way I expect it would.

 

data prdsale;
	set sashelp.PrdSale;
	run;

proc sort data= prdsale nodupkey;
	by region division year quarter;
	run;

proc expand data= prdsale
	out= prdsale;
	by region division;
	convert actual= StdDev_actual/transformout= (MovStd 5); 
	run;
	
proc print ;
	var region division actual StdDev_Actual;
	run;

The above code calculates the standard deviation of the variable, actual, using the most recent five data point. But, it also calculates Std.Dev. when there are less than 5 data point. For example, the third observation in the data has the data point in time t, t-1, and t-2 (but not in time t-3 and t-4). It still calculates the Std.Dev. of three data points. Can I just calculate it only when there are strictly five data point?

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Add the TRIM and NOMISS options to your CONVERT statement.

 

See the explanation in Trick 4 in this paper or check the documentation. 

https://support.sas.com/resources/papers/proceedings10/093-2010.pdf

 

https://documentation.sas.com/?docsetId=etsug&docsetTarget=etsug_expand_details19.htm&docsetVersion=...

 

delete_summary.JPG

 

NOMISS options, from documentation link above

 

The NOMISS operator does not perform any calculations, but serves to modify the operation of the moving time window operator that follows it. The NOMISS operator has no effect unless it is followed by a moving time window operator.

For example, the following statement computes a five-period moving average of the variable X but produces a missing value when any of the five values are missing:

convert x=y / transformout=( nomiss movave 5 );

The following statement computes the cumulative sum of the variable X but produces a missing value for all periods after the first missing X value:

convert x=y / transformout=( nomiss cusum );

 EDITed to add NOMISS option.

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

Just eliminate the first 4 records from the output data set. In that case, you would be much better off giving the output data set a different name than the input data set.

--
Paige Miller
braam
Quartz | Level 8
Thanks for your suggestion. But my data includes some missing observations here and there, which makes this issue severer. So I was wondering if there is a general solution (or option) for this.
PaigeMiller
Diamond | Level 26

So what result do you want when there is not 5 data point (i.e. at least one of the previous 5 are missing)?

--
Paige Miller
braam
Quartz | Level 8
Then I would like to have a missing because data is not sufficient to calculate StdDev.

The way I did (bypassed) is to generate another variable, which is equal to one if none of the five data point is missing. When this variable is equal to 1, I keep StdDev. But I believe that this is not straightforward, so I wanted to have an easier solution.
Reeza
Super User

Add the TRIM and NOMISS options to your CONVERT statement.

 

See the explanation in Trick 4 in this paper or check the documentation. 

https://support.sas.com/resources/papers/proceedings10/093-2010.pdf

 

https://documentation.sas.com/?docsetId=etsug&docsetTarget=etsug_expand_details19.htm&docsetVersion=...

 

delete_summary.JPG

 

NOMISS options, from documentation link above

 

The NOMISS operator does not perform any calculations, but serves to modify the operation of the moving time window operator that follows it. The NOMISS operator has no effect unless it is followed by a moving time window operator.

For example, the following statement computes a five-period moving average of the variable X but produces a missing value when any of the five values are missing:

convert x=y / transformout=( nomiss movave 5 );

The following statement computes the cumulative sum of the variable X but produces a missing value for all periods after the first missing X value:

convert x=y / transformout=( nomiss cusum );

 EDITed to add NOMISS option.

braam
Quartz | Level 8
Thanks a lot for your help. TRIMLEFT worked for me well!
braam
Quartz | Level 8

For those who will see this post:

 

1. trimleft doesn' handle missing values well.

2. nomiss should be used for handling missing values.

Ksharp
Super User

SQL can solve it easily .

 

data air;
 set sashelp.air;
t+1;
run;
proc sql;
create table want as
select *,case
when (select count(*) from air where t between a.t-4 and a.t)<5 then .
else (select std(air) from air where t between a.t-4 and a.t) 
end as rolling_std
 from air as a;
quit;
Ksharp
Super User

data prdsale;
	set sashelp.PrdSale;
	run;

proc sort data= prdsale nodupkey;
	by region division year quarter;
	run;
data air;
 set prdsale;
 by region division;
 if first.division then t=0;
 t+1;
run;

proc sql;
create table want as
select *,case
when (select count(*) from air
where region=a.region and division=a.division and t between a.t-4 and a.t)<5 then .
else (select std(actual) from air where
region=a.region and division=a.division and t between a.t-4 and a.t) 
end as rolling_std
 from air as a;
quit;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1776 views
  • 0 likes
  • 4 in conversation