Contributor
Posts: 22

# Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

I've set up the below code to demonstrate the challenge that I'm having. I would like to have a user-defined function that will return the average value of the top quintile of values passed in. The number of passed in values should be allowed to vary (so, passing in an array) and the array shouldn't have to be sorted prior to passing it in. Unfortunately, I can't seem to sort the array within the user-defined function. I've tried CALL SORTN and it doesn't seem to sort the values at all, but with no error message. When I try to sort the array in place within proc fcmp, I get an error that I can't assign values to the array, that they are read-only. When I try to copy the values into a new array, the function seems to require that I know the number of values in the array at compile time. Any assistance would be greatly appreciated. Also, if it is possible to do what I'm trying to do using built-in functions within SAS, that would be ideal.

PROC FCMP OUTLIB=SASUSER.FUNCS.TRIAL;

FUNCTION TOP_QUINTILE_MEAN(VALU

• ) VARARGS;
•    TQA = 0;

SUM=0;

C = 0;

COUNT = HBOUND(VALU);

CUTOFF = INT(COUNT/5);

IF CUTOFF < 1 THEN

DO;

CUTOFF = 1;

END;

/* SORT THE ARRAY */

/*   DO UNTIL (SORTED);*/

/*    SORTED=1;*/

/*     DO I = 1 TO COUNT - 1;*/

/*       IF VALU{I} < VALU {I+1} THEN*/

/*         DO;*/

/*            TEMP = VALU{I+1};*/

/*            VALU{I+1} = VALU{I};*/

/*            VALU{I} = TEMP;*/

/*          SORTED = 0;*/

/*            END;*/

/*     END;*/

/*   END;*/

DO J = CUTOFF TO 1 BY -1;

C=C+1;

SUM = SUM + VALU{J};

END;

TQA = SUM/C;

RETURN(TQA);

ENDSUB;

RUN;

OPTIONS CMPLIB = SASUSER.FUNCS;

data one;

input temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9 temp10;

datalines;

10 20 30 40 50 60 70 80 90 100

100 90 80 70 60 50 40 30 20 10

;

run;

data two;

set one;

array vals{10} _temporary_ ;

array temps{10} temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9 temp10;

do J = 1 to dim(vals);

vals(J) = temps(J);

end;

drop J;

DO UNTIL (SORTED);

SORTED=1;

DO I = 1 TO 9;

IF VALS{I} < VALS {I+1} THEN

DO;

TEMP = VALS{I+1};

VALS{I+1} = VALS{I};

VALS{I} = TEMP;

SORTED = 0;

END;

END;

END;

DROP I SORTED temp;

ans = top_quintile_mean(vals);

run;

Super User
Posts: 13,498

## Re: Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

Might try using the PCTL function to find the appropriate starting value and select the values from the array larger than that value. Or possibly the LARGEST function.

Contributor
Posts: 22

## Re: Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

I had the same thought, but the largest function with an array as an argument seems to work only outside of PROC FCMP for some reason.

SAS Super FREQ
Posts: 4,237

## Re: Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

What about computing the P80 value outside of the function and then pass it in as an argument? Then the function just needs to sum and divide by the number of elements greater than the P80 argument.

You don't mention if being in the DATA step is a requirement, but here is the function in the SAS/IML language:

proc iml;
start MeanP80(x);
call qntl(q, colvec(x), 0.8);  /* find 80th pctl */
return( mean( x[loc(x>=q)] )); /* return mean of all elements >= P80 */
finish;

mp20 = MeanP80( 1:100 );
print mp20;

Posts: 1,318

## Re: Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

The following obviously ignores your interest in performing this with a user-defined function, but it is a straight forward and intuitive method for obtaining the end result

```data have;
array temp[10];
input id @;
do _n_=1 to dim(temp);
input temp[_n_] @@;
end;
cards;
1 10 20 30 40 50 60 70 80 90 100
2 100 90 80 70 60 50 40 30 20 10
;
run;

proc transpose data=have out=have_t(drop=_name_ rename=(col1=temp));
by id;
var temp:;
run;

proc rank data=have_t out=have_t_r groups=5 ties=mean;
by id;
var temp;
ranks q;
run;

proc means data=have_t_r  noprint;
by id;
var temp;
where q=4;
output out=want(drop=_type_ _freq_) mean=tqm_temp;
run;

```

One way to go about using FCMP to do this would be to use a hash object.  This will alleviate the problems with sorting the values.  Note that you will really need to do something better than simply dividing the number of dimensions by 5 to determine the cutoff for the quintuples, to solve issues such as ties...

```data have;
array temp[10];
input id @;
do _n_=1 to dim(temp);
input temp[_n_] @@;
end;
cards;
1 10 20 30 40 50 60 70 80 90 100
2 100 90 80 70 60 50 40 30 20 10
;
run;

proc fcmp outlib=work.func.math;

function tqm(in);
length itm r 8;
declare hash _x(ordered:'d');
rc=_x.definekey('itm', 'r');
rc=_x.definedone();
do r=1 to dim(in);
itm=in;
end;
declare hiter _xi('_x');
rc=_xi.first();
do i=1 to int(dim(in)/5); *should be modified;
tot=sum(tot, itm);
rc=_xi.next();
end;
rc=_x.clear(); *this is important;
return(tot/(i-1));
endsub;

quit;

options cmplib=(work.func);

data want;
array temp[10];
array _temp[10] _temporary_;
set have;
x=tqm(_temp);
put x=;
run

```
Super User
Posts: 13,498

## Re: Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

Here is a code section that will create a local to the function temporary array, copy values and sort the temp array.

length = dim(VALU);

array temp[1] /nosymbols;

call dynamic_array(temp, length);

/* copy values from passed array to local temporary array*/

do i=1 to length; temp=valu;end;

/* bubble sort since CALL SORTN (of temp

• ) doesn't seem to want to work*/
•    do i=1 to (length-1);

do j= (i+1) to length;

if temp>temp then do;

t= temp; temp=temp;temp=t;

end;

end;

end;

The above sorts the array from low to high, might want to consider high to low as a solution for missing values and such.

Super Contributor
Posts: 320

## Re: Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

It can be simply done by a data step. I am not sure whether this meets your requirement. Since the values are discrete, the top quintile may need approximation. If N= 12, then the top quintile may take 3 values. If this is acceptable, then call sortn() gives the sorted array to work with and the data step is:

I am borrowing the input data set from FriedEgg.

data have;

array temp[10];

input id @;

do _n_=1 to dim(temp);

input temp[_n_] @@;

end;

cards;

1 10 20 30 40 50 60 70 80 90 100

2 100 90 80 70 60 50 40 30 20 10

run;

data want;

set have;

array t[10] temp1 - temp10;

call sortn(of t

• );
•    num = ceil(dim(t) / 5);

do i = dim(t) - num + 1 to dim(t);

sum = sum(sum,  t);

end;

mean = sum / num;

keep id mean;

run;

Discussion stats
• 6 replies
• 451 views
• 0 likes
• 5 in conversation