I'd like to develop a SAS function using PROC FCMP to calculate the circular mean of a set of values, but before I go to the effort I wonder if anyone has already done this.
Here's some background on the circular mean:
https://en.wikipedia.org/wiki/Mean_of_circular_quantities
So, I'd like to develop a user function for circular mean--let's call it CMEAN--similar to the MEAN and MEDIAN function:
mean_days = mean(of rb1-rb200);
median_days = median(of rb1-rb200);
avg_day_of_week = cmean(of wd1-wd200);
avg_month_of_year = cmean(of m1-m200);
I did find some appropriate SAS code referenced by Ulric Lund--see the last link on this page:
http://statweb.calpoly.edu/ulund/
But it requires an input and output dataset, and I would like to turn this code (or something similar) into a SAS function as described above.
Or is there a better approach?
Thanks much,
David Oesper
This should be close...
options cmplib=sasuser.fcmp;
/* Circular mean. Returns a value between -cycleLength/2 and +cycleLength/2 */
proc fcmp outlib=sasuser.fcmp.test; function cmean (cycleLength, x[*]) varargs;
factor = 2 * constant("PI") / cycleLength;
cosTot = 0;
sinTot = 0;
n = 0;
do i=1 to dim(x);
if not missing(x{i}) then do;
cosTot = cosTot + cos(factor*x[i]);
sinTot = sinTot + sin(factor*x[i]);
n = n + 1;
end;
end;
if n > 0 and fuzz(sinTot) ne 0 then return(atan2(sinTot/n, cosTot/n)/factor);
else return (.);
endsub;
run;
data _null_;
array x{2}; /* Argument list must be an array */
x1 = 100;
x2 = 270;
mean = cmean(360, x);
put mean=;
run;
This even closer
options cmplib=sasuser.fcmp;
/* Circular mean. Returns a value between -cycleLength/2 and +cycleLength/2 */
proc fcmp outlib=sasuser.fcmp.test; function cmean (cycleLength, x[*]) varargs;
factor = 2 * constant("PI") / cycleLength;
cosTot = 0;
sinTot = 0;
n = 0;
do i=1 to dim(x);
if not missing(x{i}) then do;
cosTot = cosTot + cos(factor*x[i]);
sinTot = sinTot + sin(factor*x[i]);
n = n + 1;
end;
end;
if n > 0 then return(atan2(sinTot/n, cosTot/n)/factor);
else return (.);
endsub;
run;
data _null_;
array x{2}; /* Argument list must be an array */
x1 = 100;
x2 = 270;
mean = cmean(360, x);
put mean=;
x1 = 350;
x2 = 380;
mean = cmean(360, x);
put mean=;
x1 = 260;
x2 = 280;
mean = cmean(360, x);
put mean=;
run;
PGStat's second attempt does not handle the case where the Cartesian average of the points is the origin. You need to handle data for which
FUZZ(sinTot)=0 and FUZZ(cosTot)=0
or else you get the wrong answer for the following examples:
data _null_;
array x{4}; /* Argument list must be an array */
x1 = 45; x3 = 225; /* rotational symmetry by 180 degrees */
mean = cmean(360, x);
put mean=;
x2 = 135; x4 = 315; /* rotational symmetry by 90 degrees */
mean = cmean(360, x);
put mean=;
run;
I suggest the using the statement
if n = 0 OR (fuzz(sinTot)=0 AND fuzz(sinTot)=0) then return(.);
else return(atan2(sinTot/n, cosTot/n)/factor);
I suspected I was missing something. Thank you @Rick_SAS!
Thank you, PGStats and Rick_SAS! I took what you had and made some additional changes. Since the SAS MONTH function returns the numbers 1 through 12 rather than 0 through 11, and the WEEKDAY function returns the numbers 1 through 7 rather than 0 through 6, I took the liberty of detecting a cycleLength of either 7 or 12 and subtracting 1 accordingly (and adding it back in to the result).
options cmplib=_null_;
/* Circular mean. Returns a value between -cycleLength/2 and +cycleLength/2 */
proc fcmp outlib=sasuser.fcmp.test;
function cmean (cycleLength,x[*]) varargs;
if cycleLength in(7,12) then s = 1;
else s = 0;
factor = 2 * constant("PI") / cycleLength;
cosTot = 0;
sinTot = 0;
n = 0;
do i = 1 to dim(x);
if not missing(x{i}) then do;
cosTot = cosTot + cos(factor*(x[i]-s));
sinTot = sinTot + sin(factor*(x[i]-s));
n = n + 1;
end;
end;
if n = 0 or (fuzz(sinTot) = 0 and fuzz(cosTot) = 0) then return(.);
else do;
rslt = atan2(sinTot/n,cosTot/n)/factor + s;
if rslt < 0 then rslt = rslt + cycleLength;
if rslt >= cycleLength then rslt = rslt - cycleLength;
return(rslt);
end;
endsub;
run;
options cmplib=sasuser.fcmp;
data cmeans;
array x{4}; /* Argument list must be an array */
call missing(of x{*});
x1 = 100;
x2 = 270;
cmean = cmean(360,x);
output;
call missing(of x{*});
x1 = 1.5707963265;
x2 = 3.1415926536;
cmean = cmean(2*constant("pi"),x);
output;
call missing(of x{*});
x1 = 1;
x2 = 3;
cmean = cmean(7,x); /* <-- days of the week, function subtracts 1 from all x before calculating */
output;
call missing(of x{*});
x1 = 5;
x2 = 7;
cmean = cmean(12,x); /* <-- months of the year, function subtracts 1 from all x before calculating */
output;
run;
proc print data=cmeans;
title 'cmeans';
run;
Also, keep in mind that the array you are passing to PROC FCMP has to be called x (x1, x2, ...). The way I handled passing arrays not called x is shown in the following code snippet:
%let ae = 300; /* number of array elements */
.
.
.
array om{&ae} om1-om&ae;
array od{&ae} od1-od&ae;
array x{&ae} _temporary_;
.
.
.
call missing(of x{*});
do i = 1 to dim(om);
x{i} = om{i};
end;
mean_order_month = cmean(12,x);
call missing(of x{*});
do i = 1 to dim(od);
x{i} = od{i};
end;
mean_order_dow = cmean(7,x);
Many thanks!
Dave
Here's the code snippet properly formatted. Sorry about that!
%let ae = 300; /* number of array elements */
.
.
.
array om{&ae} om1-om&ae;
array od{&ae} od1-od&ae;
array x{&ae} _temporary_;
.
.
.
call missing(of x{*});
do i = 1 to dim(om);
x{i} = om{i};
end;
mean_order_month = cmean(12,x);
call missing(of x{*});
do i = 1 to dim(od);
x{i} = od{i};
end;
mean_order_dow = cmean(7,x);
No need to make a special case of some cycle length. I think you could get the desired results by shifting the data by half a cycle to calculate the mean and then to shift it back to report the result. Note: the array passed to the function can have any valid name.
/* Circular mean. Returns a value > 0 and <= cycleLength. */
proc fcmp outlib=sasuser.fcmp.test;
function cmean (cycleLength, x[*]) varargs;
factor = 2 * constant("PI") / cycleLength;
offset = cycleLength/2;
cosTot = 0;
sinTot = 0;
n = 0;
do i = 1 to dim(x);
if not missing(x{i}) then do;
cosTot = cosTot + cos(factor*(x[i]-offset));
sinTot = sinTot + sin(factor*(x[i]-offset));
n = n + 1;
end;
end;
if n = 0 then return(.);
if fuzz(sinTot) = 0 and fuzz(cosTot) = 0 then return(.);
return ( atan2(sinTot/n,cosTot/n) / factor + offset );
endsub;
run;
options cmplib=sasuser.fcmp;
data cmeans;
array y{4}; /* cmean argument list must be an array */
call missing(of y{*});
cycleLength = 7;
y1 = 1;
do y2 = 1 to cycleLength;
cmean = cmean(cycleLength, y);
output;
end;
call missing(of y{*});
cycleLength = 12;
y1 = 1;
do y2 = 1 to cycleLength;
cmean = cmean(cycleLength, y);
output;
end;
run;
proc print data=cmeans;
title 'cmeans';
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.