## Calculate percentiles in distinct groups

Solved
Occasional Contributor
Posts: 5

# Calculate percentiles in distinct groups

Hi there,

I do have a dataset:

VAR_1   VAR_2   VAR_3   IND_1   IND_2   IND_N

A             XX          Z           23         400         2

A             XX          Z           34         405         7

A             XX          F           56         328         8

A             YY          F           11         120         4

A             YY          F           17         185         9

A             YY          Z           18         280         9

A             YY          Z           18         280         9

B             XX          Z           26         394         3

B             XX          Z           37         399         8

B             XX          F           59         322         9

B             YY          F           14         114         5

B             YY          Z           21         274        10

B             YY          Z           21         274        10

Now, in each of the groups constructed from VAR1, VAR2, VAR3: (A   XX   Z), (A  YY  F), (A  YY  Z) etc I need to calculate percentiles in each of the indicators IND_1, IND_2 etc. The thing is that number of indicators is not fixed and can change: IND_1...IND_N.

I understand that first step is to construct all distinct groups from VAR1,VAR2,VAR3 and then in each of these groups calcultae percentiles for each of the indicators IND_1, IND_2...IND_N.

Appreciate if you could you help me to build a code for that. Thanks a lot.

Accepted Solutions
Solution
‎07-26-2017 08:48 AM
Posts: 1,125

## Re: Calculate percentiles in distinct groups

Hi @mbarkauskas,

mbarkauskas wrote:

But it performs percentile calculation in each distinct group of var1 var2 var3 ONLY for paramater ind_1. The other parameters ind_2 and ind_n are not processed.

This happens because in your PctlPre= option only one prefix ("PERC_") is listed, but it should be a list of as many distinct prefixes as there are variables in variable list ind_:.

You could create such a list of prefixes as follows:

proc sql noprint;
select cats(name,'p') into :pctnames separated by ' '
from dictionary.columns
where libname='WORK' & memname='QUERY_FOR_PERC' & upcase(name) eqt 'IND_'
order by varnum;
quit;

Macro variable PCTNAMES will then contain one prefix for each IND_i variable, where the prefix is just the variable name plus a 'p' suffix (but you can modify this by changing cats(name,'p') to something else), e.g. IND_7p for variable IND_7.

Use this macro variable in the OUTPUT statement of PROC UNIVARIATE:

PctlPre=&pctnames

(instead of PctlPre=PERC_). The variables containing the percentiles will then be named, e.g., IND_7p51 (for the 51th percentile of variable IND_7).

Also, I'm wondering if you really want to calculate the 1st, 11th, 21th, ..., 91th percentiles (which is what PctlPts=1 to 100 by 10 generates). PctlPts=0 to 100 by 10 would seem more common to me (i.e. minimum, 10th, 20th, ..., 90th percentile, maximum).

All Replies
Super User
Posts: 10,213

## Re: Calculate percentiles in distinct groups

It is a big advantage fro SAS. You don't need to know how many variables there are in advance .

proc summary data=have;
by var_1 var_2 var_3;
var ind_: ;
output out=want p25= p75= /autoname;
run;
proc print;run;
Occasional Contributor
Posts: 5

## Re: Calculate percentiles in distinct groups

Thanks. I am currently using the code:

Proc Univariate Data=work.query_for_perc NoPrint;
by var_1 var_2 var_3;
var ind_:;
Output Out=Result2 PctlPre=PERC_ PctlPts=1 to 100 by 10;
Run;

But it performs percentile calculation in each distinct group of var1 var2 var3 ONLY for paramater ind_1. The other parameters ind_2 and ind_n are not processed.

Solution
‎07-26-2017 08:48 AM
Posts: 1,125

## Re: Calculate percentiles in distinct groups

Hi @mbarkauskas,

mbarkauskas wrote:

But it performs percentile calculation in each distinct group of var1 var2 var3 ONLY for paramater ind_1. The other parameters ind_2 and ind_n are not processed.

This happens because in your PctlPre= option only one prefix ("PERC_") is listed, but it should be a list of as many distinct prefixes as there are variables in variable list ind_:.

You could create such a list of prefixes as follows:

proc sql noprint;
select cats(name,'p') into :pctnames separated by ' '
from dictionary.columns
where libname='WORK' & memname='QUERY_FOR_PERC' & upcase(name) eqt 'IND_'
order by varnum;
quit;

Macro variable PCTNAMES will then contain one prefix for each IND_i variable, where the prefix is just the variable name plus a 'p' suffix (but you can modify this by changing cats(name,'p') to something else), e.g. IND_7p for variable IND_7.

Use this macro variable in the OUTPUT statement of PROC UNIVARIATE:

PctlPre=&pctnames

(instead of PctlPre=PERC_). The variables containing the percentiles will then be named, e.g., IND_7p51 (for the 51th percentile of variable IND_7).

Also, I'm wondering if you really want to calculate the 1st, 11th, 21th, ..., 91th percentiles (which is what PctlPts=1 to 100 by 10 generates). PctlPts=0 to 100 by 10 would seem more common to me (i.e. minimum, 10th, 20th, ..., 90th percentile, maximum).

Super User
Posts: 10,213

## Re: Calculate percentiles in distinct groups

if you have two more variable you need specify the corresponding prefix in proc univariate :

proc sort data=sashelp.class out=class;by sex;run;

Proc Univariate Data=class noprint;
by sex;
var weight height;
Output Out=Result2 PctlPre=weight_ height_ PctlPts=1 to 100 by 10;
Run;
Super User
Posts: 20,735

## Re: Calculate percentiles in distinct groups

Proc univariate provides some more options for various percentiles.

Occasional Contributor
Posts: 5