BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
hellohere
Pyrite | Level 9

I am to simple calculate ema of a variable within dataset with a ByGroup. In addition, a simple twist is applied[averaged-back-n-values to smooth out further]. 

 

The full code is below. Somehow the lag values always has issue. Anyone can help?! 

 

data _temp;
do grp=1 to 4;
	do i=1 to 100;
		x=sin(i/10.0); output;
	end;
end;
run;quit;

%macro ema_fs_by(ds, var, byvar, long, avgback);
	%let alpha1=(2.0/(&long.+1)); 

	data temp;
	    do i=1 to &avgback.;
		lagline_long="lag"||COMPRESS(put(i,$4.))||"(&var._fs&avgback._long)"; 
	    output;
	    end;
    run;quit;
	proc sql noprint;
		select lagline_long into: avgbackline_long separated by ","
    	from temp where i<=&avgback.;
	quit;

	%put "avgbackline=&avgbackline_long.";

	data &ds.;
	set &ds.(where=(missing(&var.)=0));
	by &byvar.;
	retain _ind_ &var._fs&avgback._long ;
	if first.&byvar. then do; 
		_ind_=1; &var._fs&avgback._long= &var.; 
	end;
	else _ind_=_ind_+1;

	if _ind_<=&avgback. then do;
		&var._fs&avgback._long= &var.; 
	end;
	else do;
		&var._fs&avgback._long= (&alpha1.)*&var.+(1-&alpha1.)* (mean(&avgbackline_long.)); 
	end;
	run;quit;
%mend;

%let avgback=2;
%ema_fs_by(_temp, x, grp, 0.1,&avgback.);


				title "ema_fs_x outcome";
				ods layout gridded columns=2 rows=2 advance=table;
				ods graphics /width=480px height=300px;
				proc sgplot data=_temp;
					by grp;
					series x=_ind_ y=x/  lineattrs=( color=blue thickness=2 pattern=solid);  
					series x=_ind_ y=x_fs&avgback._long/ y2axis   lineattrs=( color=red thickness=2 pattern=solid);  
				run;quit;
				ods layout end;	

ema_pic.jpg

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Try something like this:

data _temp;
do grp=1 to 4;
  do i=1 to 100;
    x=sin(i/10.0); output;
  end;
end;
run;

%let ds=_temp;
%let var=x;
%let byvar=grp;
%let long=0.1;
%let avgback=3;

data step1;
  retain alpha %sysevalf((2.0/(&long.+1))) ;
  do _ind_=1 by 1 until(last.&byvar);
    set &ds;
    by &byvar.;
    where not missing(&var);
    retain _initial_&var;
    array _lag [0:%eval(&avgback-1)] ;
    if first.&byvar. then _initial_&var=&var;
    if _ind_<=&avgback then ema_fs_&var=&var;
    else ema_fs_&var = alpha*&var + (1-alpha)*mean(of _lag[*]);
    output;
    _lag[mod(_ind_,&avgback)]=&var;
  end;
  *drop alpha _lag: ;
run;

title "ema_fs_&var lag &avgback outcome";
ods layout gridded columns=2 rows=2 advance=table;
ods graphics /width=480px height=300px;
proc sgplot data=step1;
  by grp;
  series x=_ind_ y=&var/  lineattrs=( color=blue thickness=2 pattern=solid);  
  series x=_ind_ y=ema_fs_&var/ y2axis   lineattrs=( color=red thickness=2 pattern=solid);  
run;quit;
ods layout end; 

The use of permanent variables for the ARRAY and placing the SET inside the DO loop means that the arrayed variables are automatically reset to missing at the start each BY group.

Using an array that is indexed from 0 to N-1 instead of from 1 to N makes it easier to use the MOD() function to indicate which index in the array needs to be replaced with the current value.

 

Note: I don't think you want to exclude the missing values.  That would mess up the _IND_ variable you later use as your X axis.

Screenshot 2025-12-10 at 9.04.22 AM.png

View solution in original post

11 REPLIES 11
sbxkoenk
SAS Super FREQ

Hello ,

 

I do not understand the simple twist.

What exactly do you try to achieve?

 

Can't you start from the regular EWMA / EMA (in SAS/ETS PROC EXPAND) and do some post-processing?

 

Ciao,

Koen

hellohere
Pyrite | Level 9

EMA(X)=wt*X + (1-wt)*lag1(EMA(X))

when i=1, EMA(X)=X

 

with twist:

 

EMA(X)= wt*X +(1-wt) * mean(lag1(EMA(x)), lag2(EMA(X)), ... lagn(EMA(X)))

n: =avgback count

when i=1,2,...n, EMA(X)=X

Tom
Super User Tom
Super User

Turn on MPRINT option and you can see you are making a common mistake with your use of LAG() functions.

You are running them conditionally.  In particular this block of code.

else do;
  &var._fs&avgback._long= (&alpha1.)*&var.+(1-&alpha1.)* (mean(&avgbackline_long.)); 
end;

The macro varaible AvgBackLine_Long appears to have been constructed as LAG1(..),LAG2(..),....LAGn(..).

 

For the LAG() function to know what the previous value was you have to pass it the previous value.  By skipping the call to the LAG() function sometimes those values are not pushed onto the stack and so cannot be popped back off it later.

 

You might be able to fix your code by just changing that IF/THEN/ELSE block into something like this:

&var._fs&avgback._long= (&alpha1.)*&var.+(1-&alpha1.)* (mean(&avgbackline_long.)); 
if _ind_<=&avgback. then &var._fs&avgback._long= &var.; 

But there is probably a much simpler way to code your algorithm using arrays.

Tom
Super User Tom
Super User

Try something like this:

data _temp;
do grp=1 to 4;
  do i=1 to 100;
    x=sin(i/10.0); output;
  end;
end;
run;

%let ds=_temp;
%let var=x;
%let byvar=grp;
%let long=0.1;
%let avgback=3;

data step1;
  retain alpha %sysevalf((2.0/(&long.+1))) ;
  do _ind_=1 by 1 until(last.&byvar);
    set &ds;
    by &byvar.;
    where not missing(&var);
    retain _initial_&var;
    array _lag [0:%eval(&avgback-1)] ;
    if first.&byvar. then _initial_&var=&var;
    if _ind_<=&avgback then ema_fs_&var=&var;
    else ema_fs_&var = alpha*&var + (1-alpha)*mean(of _lag[*]);
    output;
    _lag[mod(_ind_,&avgback)]=&var;
  end;
  *drop alpha _lag: ;
run;

title "ema_fs_&var lag &avgback outcome";
ods layout gridded columns=2 rows=2 advance=table;
ods graphics /width=480px height=300px;
proc sgplot data=step1;
  by grp;
  series x=_ind_ y=&var/  lineattrs=( color=blue thickness=2 pattern=solid);  
  series x=_ind_ y=ema_fs_&var/ y2axis   lineattrs=( color=red thickness=2 pattern=solid);  
run;quit;
ods layout end; 

The use of permanent variables for the ARRAY and placing the SET inside the DO loop means that the arrayed variables are automatically reset to missing at the start each BY group.

Using an array that is indexed from 0 to N-1 instead of from 1 to N makes it easier to use the MOD() function to indicate which index in the array needs to be replaced with the current value.

 

Note: I don't think you want to exclude the missing values.  That would mess up the _IND_ variable you later use as your X axis.

Screenshot 2025-12-10 at 9.04.22 AM.png

hellohere
Pyrite | Level 9

Thank. wt=0.1 -> long=19[or 9 does not matter]. 

 

I am coding with reverse-transpose logic to skip the issue from BYVar. It works out.

 

You code comes out "close-perfectly" and much neater[So I am taking yours].  

 

 

 

 

ema_pic.jpg

 

 

 

Tom
Super User Tom
Super User

I do not understand what you mean by "reverse transpose".

I do not understand what you mean about the relationship between WT and LONG. 

Your SAS code appears to calculate WT as 2/(LONG+1)  which would imply setting LONG to 10 would result in WT being  2/11.

 

Note that by using the circular array you really only need to do a special case for the FIRST observation in each group.  When there are less that N values in the array the MEAN() function will ignore the missing values.

    if first.&byvar then ema_fs_&var=&var;
    else ema_fs_&var = alpha*&var + (1-alpha)*mean(of _lag[*]);

 

 

 

 

hellohere
Pyrite | Level 9

Tom:

 

Thanks very much. I look closely and find one issue.  The lag is on X rathet than EMA(X).

 

EMA(X)= wt*X +(1-wt) * mean(lag1(EMA(x)), lag2(EMA(X)), ... lagn(EMA(X)))

n: =avgback count

when i=1,2,...n, EMA(X)=X

 

dataset.png

 

About "reverse transpose logic", I mean transpose-then-ema()-then-transpose[back with grp info].  The code works

but clumsy. Forget it. 

 

Tom
Super User Tom
Super User

To use the previously calculated ema_fs values just change this line from:

 _lag[mod(_ind_,&avgback)]=&var;

to

 _lag[mod(_ind_,&avgback)]=ema_fs_&var;

Do you mean a double transpose?  Transpose from long to wide, then try to run your algorithm using the wide dataset, then transpose again from wide to long?  Sounds like that would just make the problem harder to solve.

hellohere
Pyrite | Level 9

Tom: 

 

Thanks.

 

My SAS coding experience has not understood the line below yet[_initial_&var is not called elsewhere?!]. 

if first.&byvar. then _initial_&var=&var;

Ye, double transpose.   It is null and tedious. But right in math. I would note care much. 

 

In logic, the issue raises from ByVar. Transpose detours and skips this issue. 

 

 

data _temp;
do grp=1 to 4;
	do i=1 to 100;
		x=sin(i/10.0)+grp; output;
	end;
end;
run;quit;

proc sort data=_temp; by i;
run;quit;

proc transpose data=_temp out=_temp_t(drop=_name_) prefix=x_;
by i;
var x;
id grp;
run;quit;
%ema(_temp_t, x_1,0.1, 1);
%ema(_temp_t, x_2,0.2, 2);
%ema(_temp_t, x_3,0.3, 3);
%ema(_temp_t, x_4,0.4, 4);

data _temp_t; set _temp_t(rename=(x_1_ema_1=x_ema_1 x_2_ema_2=x_ema_2 x_3_ema_3=x_ema_3 x_4_ema_4=x_ema_4));
run;quit;

proc transpose data=_temp_t out=_temp_t_r;
by i;
var x_ema_:;
run;

data _temp_t_r; set _temp_t_r; 
_grp_=input(substr(_name_,7,1),2.);
run;quit;

proc sort data= _temp_t_r; 
by _grp_ i;
run;quit; 

data _temp_t_r(rename=(col1=x_ema)); set _temp_t_r(drop=_name_); 
run;quit;

proc sql;
create table  _temp_t_r as
select 	a.*, 
		b.x
from  _temp_t_r as a
left join  _temp as b
on a.i=b.i and a._grp_=b.grp
order by a._grp_, a.i;
quit;

/*just take the reverse transpose*/
				title "ema_x outcome";
				ods layout gridded columns=2 rows=2 advance=table;
				ods graphics /width=480px height=300px;
				proc sgplot data=_temp_t_r;
					by _grp_;
					series x=i y=x/  lineattrs=( color=blue thickness=2 pattern=solid);  
					series x=i y=x_ema/ y2axis   lineattrs=( color=red thickness=2 pattern=solid);  
				run;quit;
				ods layout end;

 

 

Tom
Super User Tom
Super User

Your original code was for some reason remembering the first value per by group, so I had  copied that.

But it does not actually appear to be needed.

 

If you want to run the macro multiple times using different weights or lags just write the macro so that it makes a unique variable name. Or have it write unique output datasets.  

 

If you do the former then to get all of the new variables in the same dataset you can chain the calls.  Have the second call use the output of the first call as the input.

If you do the later then just remerge the datasets.

 

hellohere
Pyrite | Level 9

Ye. 

 

Your code has the beauty with BY within data step,  where lagn(x) refreshes and does not have 

the issue. 

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 515 views
  • 0 likes
  • 3 in conversation