DATA Step, Macro, Functions and more

Calculating means based on percentiles

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Calculating means based on percentiles

Dear SAS Users,

I have data as follows:

date corr1_2 corr1_3..... corr19_20

for each date, the variable corr1_2  shows the correlation between item1 and item2, ..., corr19_20 shows the correlation between item19 and item20. I have all combinations of pairwise correlations for different dates.

I have macro variables that holds 18,19 and 20 (end numbers). I calculated mean correlation for each day by using a line as follows:

meancorr=mean( %do i=1 %to &lastbmax2; %do j=&i+1 %to &max; corr&i._&j, %end;  %end; corr&lastbmax1._&max);

where lastbmax2, lastbmax1 and max are 18, 19, and 20, respectively.

My question is that how can I calculate the mean of the bivariate correlations which are lower than the 5th and higher than the 95th percentiles for each day? I can identify the percentiles by using the pctl function. But how can I calculate the means as I just described?

Thanks for any help in advance.

J.


Accepted Solutions
Solution
‎10-18-2012 02:39 AM
Valued Guide
Posts: 634

Re: Calculating means based on percentiles

This solution uses the array suggested by ArtT, but loops through the array rather than sorting the array, which would be more elegant.

data have;
input date date9. corr1_2 corr1_3 corr2_3;
datalines;
16oct2012 .1 .9 .2
17oct2012 .5 .7 .3
run;

data want;*(keep=date meancorr umean lmean);
   set have;
   array allcorr {*} corr:;
   meancorr = mean(of corrSmiley Happy;
   lower=pctl(5,of corrSmiley Happy;
   upper=pctl(95,of corrSmiley Happy;
   
   lmean=0; lcnt=0; umean=0; ucnt=0;
   do i = 1 to dim(allcorr);
      if allcorr{i} le lower then do;
         lmean + allcorr{i};
         lcnt + 1;
      end;
      if allcorr{i} ge upper then do;
         umean + allcorr{i};
         ucnt + 1;
      end;
    end;
    lmean=lmean/lcnt;
    umean=umean/ucnt;
    output want;
   run;
proc print data=want;
run;

View solution in original post


All Replies
PROC Star
Posts: 7,492

Re: Calculating means based on percentiles

I think you are complicating the task by including all of the correlations in macro variables.  If you simply loaded then into an array, in a datastep, you might be able to easily accomplish both tasks by using call sortn to sort the array.  Then, you would only have to use the of operator to include all of the correlations in the overall mean calculation and, for the top and bottom percentiles (unless I misunderstand your data), simply include the x percent at the top and bottom of the sorted array.

Solution
‎10-18-2012 02:39 AM
Valued Guide
Posts: 634

Re: Calculating means based on percentiles

This solution uses the array suggested by ArtT, but loops through the array rather than sorting the array, which would be more elegant.

data have;
input date date9. corr1_2 corr1_3 corr2_3;
datalines;
16oct2012 .1 .9 .2
17oct2012 .5 .7 .3
run;

data want;*(keep=date meancorr umean lmean);
   set have;
   array allcorr {*} corr:;
   meancorr = mean(of corrSmiley Happy;
   lower=pctl(5,of corrSmiley Happy;
   upper=pctl(95,of corrSmiley Happy;
   
   lmean=0; lcnt=0; umean=0; ucnt=0;
   do i = 1 to dim(allcorr);
      if allcorr{i} le lower then do;
         lmean + allcorr{i};
         lcnt + 1;
      end;
      if allcorr{i} ge upper then do;
         umean + allcorr{i};
         ucnt + 1;
      end;
    end;
    lmean=lmean/lcnt;
    umean=umean/ucnt;
    output want;
   run;
proc print data=want;
run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 284 views
  • 0 likes
  • 3 in conversation