Programming the statistical procedures from SAS

Calculate percentiles in distinct groups

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Calculate percentiles in distinct groups

Hi there,

 

I do have a dataset:

 

VAR_1   VAR_2   VAR_3   IND_1   IND_2   IND_N

A             XX          Z           23         400         2

A             XX          Z           34         405         7

A             XX          F           56         328         8

A             YY          F           11         120         4

A             YY          F           17         185         9

A             YY          Z           18         280         9

A             YY          Z           18         280         9

B             XX          Z           26         394         3

B             XX          Z           37         399         8

B             XX          F           59         322         9

B             YY          F           14         114         5

B             YY          Z           21         274        10

B             YY          Z           21         274        10

 

Now, in each of the groups constructed from VAR1, VAR2, VAR3: (A   XX   Z), (A  YY  F), (A  YY  Z) etc I need to calculate percentiles in each of the indicators IND_1, IND_2 etc. The thing is that number of indicators is not fixed and can change: IND_1...IND_N.

 

I understand that first step is to construct all distinct groups from VAR1,VAR2,VAR3 and then in each of these groups calcultae percentiles for each of the indicators IND_1, IND_2...IND_N.

 

Appreciate if you could you help me to build a code for that. Thanks a lot.


Accepted Solutions
Solution
‎07-26-2017 08:48 AM
Trusted Advisor
Posts: 1,115

Re: Calculate percentiles in distinct groups

Hi @mbarkauskas,

mbarkauskas wrote:

 

But it performs percentile calculation in each distinct group of var1 var2 var3 ONLY for paramater ind_1. The other parameters ind_2 and ind_n are not processed.


This happens because in your PctlPre= option only one prefix ("PERC_") is listed, but it should be a list of as many distinct prefixes as there are variables in variable list ind_:.

 

You could create such a list of prefixes as follows:

 

proc sql noprint;
select cats(name,'p') into :pctnames separated by ' '
from dictionary.columns
where libname='WORK' & memname='QUERY_FOR_PERC' & upcase(name) eqt 'IND_'
order by varnum;
quit;

Macro variable PCTNAMES will then contain one prefix for each IND_i variable, where the prefix is just the variable name plus a 'p' suffix (but you can modify this by changing cats(name,'p') to something else), e.g. IND_7p for variable IND_7.

 

 

Use this macro variable in the OUTPUT statement of PROC UNIVARIATE:

 

PctlPre=&pctnames

(instead of PctlPre=PERC_). The variables containing the percentiles will then be named, e.g., IND_7p51 (for the 51th percentile of variable IND_7).

 

 

Also, I'm wondering if you really want to calculate the 1st, 11th, 21th, ..., 91th percentiles (which is what PctlPts=1 to 100 by 10 generates). PctlPts=0 to 100 by 10 would seem more common to me (i.e. minimum, 10th, 20th, ..., 90th percentile, maximum).

 

 

View solution in original post


All Replies
Super User
Posts: 9,769

Re: Calculate percentiles in distinct groups

It is a big advantage fro SAS. You don't need to know how many variables there are in advance .

 

proc summary data=have;
by var_1 var_2 var_3;
var ind_: ;
output out=want p25= p75= /autoname;
run;
proc print;run;
Occasional Contributor
Posts: 5

Re: Calculate percentiles in distinct groups

Thanks. I am currently using the code:

 

Proc Univariate Data=work.query_for_perc NoPrint;
		by var_1 var_2 var_3;
		var ind_:; 
  Output Out=Result2 PctlPre=PERC_ PctlPts=1 to 100 by 10; 
Run;

But it performs percentile calculation in each distinct group of var1 var2 var3 ONLY for paramater ind_1. The other parameters ind_2 and ind_n are not processed.

Solution
‎07-26-2017 08:48 AM
Trusted Advisor
Posts: 1,115

Re: Calculate percentiles in distinct groups

Hi @mbarkauskas,

mbarkauskas wrote:

 

But it performs percentile calculation in each distinct group of var1 var2 var3 ONLY for paramater ind_1. The other parameters ind_2 and ind_n are not processed.


This happens because in your PctlPre= option only one prefix ("PERC_") is listed, but it should be a list of as many distinct prefixes as there are variables in variable list ind_:.

 

You could create such a list of prefixes as follows:

 

proc sql noprint;
select cats(name,'p') into :pctnames separated by ' '
from dictionary.columns
where libname='WORK' & memname='QUERY_FOR_PERC' & upcase(name) eqt 'IND_'
order by varnum;
quit;

Macro variable PCTNAMES will then contain one prefix for each IND_i variable, where the prefix is just the variable name plus a 'p' suffix (but you can modify this by changing cats(name,'p') to something else), e.g. IND_7p for variable IND_7.

 

 

Use this macro variable in the OUTPUT statement of PROC UNIVARIATE:

 

PctlPre=&pctnames

(instead of PctlPre=PERC_). The variables containing the percentiles will then be named, e.g., IND_7p51 (for the 51th percentile of variable IND_7).

 

 

Also, I'm wondering if you really want to calculate the 1st, 11th, 21th, ..., 91th percentiles (which is what PctlPts=1 to 100 by 10 generates). PctlPts=0 to 100 by 10 would seem more common to me (i.e. minimum, 10th, 20th, ..., 90th percentile, maximum).

 

 

Super User
Posts: 9,769

Re: Calculate percentiles in distinct groups

if you have two more variable you need specify the corresponding prefix in proc univariate :

 

proc sort data=sashelp.class out=class;by sex;run;

Proc Univariate Data=class noprint;
		by sex;
		var weight height; 
  Output Out=Result2 PctlPre=weight_ height_ PctlPts=1 to 100 by 10; 
Run;
Super User
Posts: 18,569

Re: Calculate percentiles in distinct groups

Proc univariate provides some more options for various percentiles. 

Occasional Contributor
Posts: 5

Re: Calculate percentiles in distinct groups

Thanks a lot for your input. It helped.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 1009 views
  • 2 likes
  • 4 in conversation