I'd like to develop a SAS function using PROC FCMP to calculate the circular mean of a set of values, but before I go to the effort I wonder if anyone has already done this.
Here's some background on the circular mean:
https://en.wikipedia.org/wiki/Mean_of_circular_quantities
So, I'd like to develop a user function for circular mean--let's call it CMEAN--similar to the MEAN and MEDIAN function:
mean_days = mean(of rb1-rb200);
median_days = median(of rb1-rb200);
avg_day_of_week = cmean(of wd1-wd200);
avg_month_of_year = cmean(of m1-m200);
I did find some appropriate SAS code referenced by Ulric Lund--see the last link on this page:
http://statweb.calpoly.edu/ulund/
But it requires an input and output dataset, and I would like to turn this code (or something similar) into a SAS function as described above.
Or is there a better approach?
Thanks much,
David Oesper
This should be close...
options cmplib=sasuser.fcmp;
/* Circular mean. Returns a value between -cycleLength/2 and +cycleLength/2 */
proc fcmp outlib=sasuser.fcmp.test; function cmean (cycleLength, x[*]) varargs;
factor = 2 * constant("PI") / cycleLength;
cosTot = 0;
sinTot = 0;
n = 0;
do i=1 to dim(x);
if not missing(x{i}) then do;
cosTot = cosTot + cos(factor*x[i]);
sinTot = sinTot + sin(factor*x[i]);
n = n + 1;
end;
end;
if n > 0 and fuzz(sinTot) ne 0 then return(atan2(sinTot/n, cosTot/n)/factor);
else return (.);
endsub;
run;
data _null_;
array x{2}; /* Argument list must be an array */
x1 = 100;
x2 = 270;
mean = cmean(360, x);
put mean=;
run;
This even closer
options cmplib=sasuser.fcmp;
/* Circular mean. Returns a value between -cycleLength/2 and +cycleLength/2 */
proc fcmp outlib=sasuser.fcmp.test; function cmean (cycleLength, x[*]) varargs;
factor = 2 * constant("PI") / cycleLength;
cosTot = 0;
sinTot = 0;
n = 0;
do i=1 to dim(x);
if not missing(x{i}) then do;
cosTot = cosTot + cos(factor*x[i]);
sinTot = sinTot + sin(factor*x[i]);
n = n + 1;
end;
end;
if n > 0 then return(atan2(sinTot/n, cosTot/n)/factor);
else return (.);
endsub;
run;
data _null_;
array x{2}; /* Argument list must be an array */
x1 = 100;
x2 = 270;
mean = cmean(360, x);
put mean=;
x1 = 350;
x2 = 380;
mean = cmean(360, x);
put mean=;
x1 = 260;
x2 = 280;
mean = cmean(360, x);
put mean=;
run;
PGStat's second attempt does not handle the case where the Cartesian average of the points is the origin. You need to handle data for which
FUZZ(sinTot)=0 and FUZZ(cosTot)=0
or else you get the wrong answer for the following examples:
data _null_;
array x{4}; /* Argument list must be an array */
x1 = 45; x3 = 225; /* rotational symmetry by 180 degrees */
mean = cmean(360, x);
put mean=;
x2 = 135; x4 = 315; /* rotational symmetry by 90 degrees */
mean = cmean(360, x);
put mean=;
run;
I suggest the using the statement
if n = 0 OR (fuzz(sinTot)=0 AND fuzz(sinTot)=0) then return(.);
else return(atan2(sinTot/n, cosTot/n)/factor);
I suspected I was missing something. Thank you @Rick_SAS!
Thank you, PGStats and Rick_SAS! I took what you had and made some additional changes. Since the SAS MONTH function returns the numbers 1 through 12 rather than 0 through 11, and the WEEKDAY function returns the numbers 1 through 7 rather than 0 through 6, I took the liberty of detecting a cycleLength of either 7 or 12 and subtracting 1 accordingly (and adding it back in to the result).
options cmplib=_null_;
/* Circular mean. Returns a value between -cycleLength/2 and +cycleLength/2 */
proc fcmp outlib=sasuser.fcmp.test;
function cmean (cycleLength,x[*]) varargs;
if cycleLength in(7,12) then s = 1;
else s = 0;
factor = 2 * constant("PI") / cycleLength;
cosTot = 0;
sinTot = 0;
n = 0;
do i = 1 to dim(x);
if not missing(x{i}) then do;
cosTot = cosTot + cos(factor*(x[i]-s));
sinTot = sinTot + sin(factor*(x[i]-s));
n = n + 1;
end;
end;
if n = 0 or (fuzz(sinTot) = 0 and fuzz(cosTot) = 0) then return(.);
else do;
rslt = atan2(sinTot/n,cosTot/n)/factor + s;
if rslt < 0 then rslt = rslt + cycleLength;
if rslt >= cycleLength then rslt = rslt - cycleLength;
return(rslt);
end;
endsub;
run;
options cmplib=sasuser.fcmp;
data cmeans;
array x{4}; /* Argument list must be an array */
call missing(of x{*});
x1 = 100;
x2 = 270;
cmean = cmean(360,x);
output;
call missing(of x{*});
x1 = 1.5707963265;
x2 = 3.1415926536;
cmean = cmean(2*constant("pi"),x);
output;
call missing(of x{*});
x1 = 1;
x2 = 3;
cmean = cmean(7,x); /* <-- days of the week, function subtracts 1 from all x before calculating */
output;
call missing(of x{*});
x1 = 5;
x2 = 7;
cmean = cmean(12,x); /* <-- months of the year, function subtracts 1 from all x before calculating */
output;
run;
proc print data=cmeans;
title 'cmeans';
run;
Also, keep in mind that the array you are passing to PROC FCMP has to be called x (x1, x2, ...). The way I handled passing arrays not called x is shown in the following code snippet:
%let ae = 300; /* number of array elements */
.
.
.
array om{&ae} om1-om&ae;
array od{&ae} od1-od&ae;
array x{&ae} _temporary_;
.
.
.
call missing(of x{*});
do i = 1 to dim(om);
x{i} = om{i};
end;
mean_order_month = cmean(12,x);
call missing(of x{*});
do i = 1 to dim(od);
x{i} = od{i};
end;
mean_order_dow = cmean(7,x);
Many thanks!
Dave
Here's the code snippet properly formatted. Sorry about that!
%let ae = 300; /* number of array elements */
.
.
.
array om{&ae} om1-om&ae;
array od{&ae} od1-od&ae;
array x{&ae} _temporary_;
.
.
.
call missing(of x{*});
do i = 1 to dim(om);
x{i} = om{i};
end;
mean_order_month = cmean(12,x);
call missing(of x{*});
do i = 1 to dim(od);
x{i} = od{i};
end;
mean_order_dow = cmean(7,x);
No need to make a special case of some cycle length. I think you could get the desired results by shifting the data by half a cycle to calculate the mean and then to shift it back to report the result. Note: the array passed to the function can have any valid name.
/* Circular mean. Returns a value > 0 and <= cycleLength. */
proc fcmp outlib=sasuser.fcmp.test;
function cmean (cycleLength, x[*]) varargs;
factor = 2 * constant("PI") / cycleLength;
offset = cycleLength/2;
cosTot = 0;
sinTot = 0;
n = 0;
do i = 1 to dim(x);
if not missing(x{i}) then do;
cosTot = cosTot + cos(factor*(x[i]-offset));
sinTot = sinTot + sin(factor*(x[i]-offset));
n = n + 1;
end;
end;
if n = 0 then return(.);
if fuzz(sinTot) = 0 and fuzz(cosTot) = 0 then return(.);
return ( atan2(sinTot/n,cosTot/n) / factor + offset );
endsub;
run;
options cmplib=sasuser.fcmp;
data cmeans;
array y{4}; /* cmean argument list must be an array */
call missing(of y{*});
cycleLength = 7;
y1 = 1;
do y2 = 1 to cycleLength;
cmean = cmean(cycleLength, y);
output;
end;
call missing(of y{*});
cycleLength = 12;
y1 = 1;
do y2 = 1 to cycleLength;
cmean = cmean(cycleLength, y);
output;
end;
run;
proc print data=cmeans;
title 'cmeans';
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.