Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

Reply
Contributor
Posts: 22

Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

I've set up the below code to demonstrate the challenge that I'm having. I would like to have a user-defined function that will return the average value of the top quintile of values passed in. The number of passed in values should be allowed to vary (so, passing in an array) and the array shouldn't have to be sorted prior to passing it in. Unfortunately, I can't seem to sort the array within the user-defined function. I've tried CALL SORTN and it doesn't seem to sort the values at all, but with no error message. When I try to sort the array in place within proc fcmp, I get an error that I can't assign values to the array, that they are read-only. When I try to copy the values into a new array, the function seems to require that I know the number of values in the array at compile time. Any assistance would be greatly appreciated. Also, if it is possible to do what I'm trying to do using built-in functions within SAS, that would be ideal.

PROC FCMP OUTLIB=SASUSER.FUNCS.TRIAL;

FUNCTION TOP_QUINTILE_MEAN(VALU

  • ) VARARGS;
  •    TQA = 0;

       SUM=0;

       C = 0;

       COUNT = HBOUND(VALU);

       CUTOFF = INT(COUNT/5);

       IF CUTOFF < 1 THEN

        DO;

           CUTOFF = 1;

           END;

    /* SORT THE ARRAY */

    /*   DO UNTIL (SORTED);*/

    /*    SORTED=1;*/

    /*     DO I = 1 TO COUNT - 1;*/

    /*       IF VALU{I} < VALU {I+1} THEN*/

    /*         DO;*/

    /*            TEMP = VALU{I+1};*/

    /*            VALU{I+1} = VALU{I};*/

    /*            VALU{I} = TEMP;*/

    /*          SORTED = 0;*/

    /*            END;*/

    /*     END;*/

    /*   END;*/

       DO J = CUTOFF TO 1 BY -1;

        C=C+1;

        SUM = SUM + VALU{J};

      

       END;

       TQA = SUM/C;

       RETURN(TQA);

      ENDSUB;

    RUN;

    OPTIONS CMPLIB = SASUSER.FUNCS;

    data one;

      input temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9 temp10;

      datalines;

      10 20 30 40 50 60 70 80 90 100

      100 90 80 70 60 50 40 30 20 10

      ;

      run;

    data two;

      set one;

    array vals{10} _temporary_ ;

    array temps{10} temp1 temp2 temp3 temp4 temp5 temp6 temp7 temp8 temp9 temp10;

    do J = 1 to dim(vals);

      vals(J) = temps(J);

    end;

    drop J;

       DO UNTIL (SORTED);

        SORTED=1;

           DO I = 1 TO 9;

             IF VALS{I} < VALS {I+1} THEN

               DO;

                  TEMP = VALS{I+1};

                  VALS{I+1} = VALS{I};

                  VALS{I} = TEMP;

                SORTED = 0;

                  END;

           END;

       END;

      DROP I SORTED temp;

    ans = top_quintile_mean(vals);

    run;

    Grand Advisor
    Posts: 10,251

    Re: Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

    Might try using the PCTL function to find the appropriate starting value and select the values from the array larger than that value. Or possibly the LARGEST function.

    Contributor
    Posts: 22

    Re: Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

    I had the same thought, but the largest function with an array as an argument seems to work only outside of PROC FCMP for some reason.

    SAS Super FREQ
    Posts: 3,420

    Re: Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

    What about computing the P80 value outside of the function and then pass it in as an argument? Then the function just needs to sum and divide by the number of elements greater than the P80 argument.

    You don't mention if being in the DATA step is a requirement, but here is the function in the SAS/IML language:

    proc iml;
    start MeanP80(x);
       call qntl(q, colvec(x), 0.8);  /* find 80th pctl */
       return( mean( x[loc(x>=q)] )); /* return mean of all elements >= P80 */
    finish;

    mp20 = MeanP80( 1:100 );
    print mp20;

    Trusted Advisor
    Posts: 1,300

    Re: Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

    The following obviously ignores your interest in performing this with a user-defined function, but it is a straight forward and intuitive method for obtaining the end result

    data have;

    array temp[10];

    input id @;

    do _n_=1 to dim(temp);

       input temp[_n_] @@;

       end;

    cards;

    1 10 20 30 40 50 60 70 80 90 100

    2 100 90 80 70 60 50 40 30 20 10

    ;

    run;

    proc transpose data=have out=have_t(drop=_name_ rename=(col1=temp));

    by id;

    var temp:;

    run;

    proc rank data=have_t out=have_t_r groups=5 ties=mean;

    by id;

    var temp;

    ranks q;

    run;

    proc means data=have_t_r  noprint;

    by id;

    var temp;

    where q=4;

    output out=want(drop=_type_ _freq_) mean=tqm_temp;

    run;

    One way to go about using FCMP to do this would be to use a hash object.  This will alleviate the problems with sorting the values.  Note that you will really need to do something better than simply dividing the number of dimensions by 5 to determine the cutoff for the quintuples, to solve issues such as ties...

    data have;

    array temp[10];

    input id @;

    do _n_=1 to dim(temp);

       input temp[_n_] @@;

       end;

    cards;

    1 10 20 30 40 50 60 70 80 90 100

    2 100 90 80 70 60 50 40 30 20 10

    ;

    run;

    proc fcmp outlib=work.func.math;

    function tqm(in

  • );
  •    length itm r 8;

       declare hash _x(ordered:'d');

       rc=_x.definekey('itm', 'r');

       rc=_x.definedone();

       do r=1 to dim(in);

          itm=in;

       rc=_x.add();

          end;

       declare hiter _xi('_x');

       rc=_xi.first();

       do i=1 to int(dim(in)/5); *should be modified;

          tot=sum(tot, itm);

       rc=_xi.next();

          end;

       rc=_x.clear(); *this is important;

       return(tot/(i-1));

       endsub;

    quit;

    options cmplib=(work.func);

    data want;

    array temp[10];

    array _temp[10] _temporary_;

    set have;

    call pokelong(peekclong(addrlong(temp[1]), 80), addrlong(_temp[1]), 80);

    x=tqm(_temp);

    put x=;

    run

    Grand Advisor
    Posts: 10,251

    Re: Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

    Here is a code section that will create a local to the function temporary array, copy values and sort the temp array.

       length = dim(VALU);

       array temp[1] /nosymbols;

       call dynamic_array(temp, length);

       /* copy values from passed array to local temporary array*/

       do i=1 to length; temp=valu;end;

       /* bubble sort since CALL SORTN (of temp

  • ) doesn't seem to want to work*/
  •    do i=1 to (length-1);

          do j= (i+1) to length;

             if temp>temp then do;

                t= temp; temp=temp;temp=t;

             end;

          end;

       end;

    The above sorts the array from low to high, might want to consider high to low as a solution for missing values and such.

    Super Contributor
    Posts: 251

    Re: Calculating the Mean of the Top Quintile of an Array in a User-Defined Function

    It can be simply done by a data step. I am not sure whether this meets your requirement. Since the values are discrete, the top quintile may need approximation. If N= 12, then the top quintile may take 3 values. If this is acceptable, then call sortn() gives the sorted array to work with and the data step is:

    I am borrowing the input data set from FriedEgg.

    data have; 

    array temp[10]; 

    input id @; 

    do _n_=1 to dim(temp); 

       input temp[_n_] @@; 

       end; 

    cards; 

    1 10 20 30 40 50 60 70 80 90 100 

    2 100 90 80 70 60 50 40 30 20 10 

    run;

    data want;

       set have;

       array t[10] temp1 - temp10;

       call sortn(of t

  • );
  •    num = ceil(dim(t) / 5);

       do i = dim(t) - num + 1 to dim(t);

          sum = sum(sum,  t);

       end;

       mean = sum / num;

    keep id mean;

    run;

    Ask a Question
    Discussion stats
    • 6 replies
    • 386 views
    • 0 likes
    • 5 in conversation