BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mbarkauskas
Calcite | Level 5

Hi there,

 

I do have a dataset:

 

VAR_1   VAR_2   VAR_3   IND_1   IND_2   IND_N

A             XX          Z           23         400         2

A             XX          Z           34         405         7

A             XX          F           56         328         8

A             YY          F           11         120         4

A             YY          F           17         185         9

A             YY          Z           18         280         9

A             YY          Z           18         280         9

B             XX          Z           26         394         3

B             XX          Z           37         399         8

B             XX          F           59         322         9

B             YY          F           14         114         5

B             YY          Z           21         274        10

B             YY          Z           21         274        10

 

Now, in each of the groups constructed from VAR1, VAR2, VAR3: (A   XX   Z), (A  YY  F), (A  YY  Z) etc I need to calculate percentiles in each of the indicators IND_1, IND_2 etc. The thing is that number of indicators is not fixed and can change: IND_1...IND_N.

 

I understand that first step is to construct all distinct groups from VAR1,VAR2,VAR3 and then in each of these groups calcultae percentiles for each of the indicators IND_1, IND_2...IND_N.

 

Appreciate if you could you help me to build a code for that. Thanks a lot.

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @mbarkauskas,

@mbarkauskas wrote:

 

But it performs percentile calculation in each distinct group of var1 var2 var3 ONLY for paramater ind_1. The other parameters ind_2 and ind_n are not processed.


This happens because in your PctlPre= option only one prefix ("PERC_") is listed, but it should be a list of as many distinct prefixes as there are variables in variable list ind_:.

 

You could create such a list of prefixes as follows:

 

proc sql noprint;
select cats(name,'p') into :pctnames separated by ' '
from dictionary.columns
where libname='WORK' & memname='QUERY_FOR_PERC' & upcase(name) eqt 'IND_'
order by varnum;
quit;

Macro variable PCTNAMES will then contain one prefix for each IND_i variable, where the prefix is just the variable name plus a 'p' suffix (but you can modify this by changing cats(name,'p') to something else), e.g. IND_7p for variable IND_7.

 

 

Use this macro variable in the OUTPUT statement of PROC UNIVARIATE:

 

PctlPre=&pctnames

(instead of PctlPre=PERC_). The variables containing the percentiles will then be named, e.g., IND_7p51 (for the 51th percentile of variable IND_7).

 

 

Also, I'm wondering if you really want to calculate the 1st, 11th, 21th, ..., 91th percentiles (which is what PctlPts=1 to 100 by 10 generates). PctlPts=0 to 100 by 10 would seem more common to me (i.e. minimum, 10th, 20th, ..., 90th percentile, maximum).

 

 

View solution in original post

6 REPLIES 6
Ksharp
Super User

It is a big advantage fro SAS. You don't need to know how many variables there are in advance .

 

proc summary data=have;
by var_1 var_2 var_3;
var ind_: ;
output out=want p25= p75= /autoname;
run;
proc print;run;
mbarkauskas
Calcite | Level 5

Thanks. I am currently using the code:

 

Proc Univariate Data=work.query_for_perc NoPrint;
		by var_1 var_2 var_3;
		var ind_:; 
  Output Out=Result2 PctlPre=PERC_ PctlPts=1 to 100 by 10; 
Run;

But it performs percentile calculation in each distinct group of var1 var2 var3 ONLY for paramater ind_1. The other parameters ind_2 and ind_n are not processed.

FreelanceReinh
Jade | Level 19

Hi @mbarkauskas,

@mbarkauskas wrote:

 

But it performs percentile calculation in each distinct group of var1 var2 var3 ONLY for paramater ind_1. The other parameters ind_2 and ind_n are not processed.


This happens because in your PctlPre= option only one prefix ("PERC_") is listed, but it should be a list of as many distinct prefixes as there are variables in variable list ind_:.

 

You could create such a list of prefixes as follows:

 

proc sql noprint;
select cats(name,'p') into :pctnames separated by ' '
from dictionary.columns
where libname='WORK' & memname='QUERY_FOR_PERC' & upcase(name) eqt 'IND_'
order by varnum;
quit;

Macro variable PCTNAMES will then contain one prefix for each IND_i variable, where the prefix is just the variable name plus a 'p' suffix (but you can modify this by changing cats(name,'p') to something else), e.g. IND_7p for variable IND_7.

 

 

Use this macro variable in the OUTPUT statement of PROC UNIVARIATE:

 

PctlPre=&pctnames

(instead of PctlPre=PERC_). The variables containing the percentiles will then be named, e.g., IND_7p51 (for the 51th percentile of variable IND_7).

 

 

Also, I'm wondering if you really want to calculate the 1st, 11th, 21th, ..., 91th percentiles (which is what PctlPts=1 to 100 by 10 generates). PctlPts=0 to 100 by 10 would seem more common to me (i.e. minimum, 10th, 20th, ..., 90th percentile, maximum).

 

 

Ksharp
Super User

if you have two more variable you need specify the corresponding prefix in proc univariate :

 

proc sort data=sashelp.class out=class;by sex;run;

Proc Univariate Data=class noprint;
		by sex;
		var weight height; 
  Output Out=Result2 PctlPre=weight_ height_ PctlPts=1 to 100 by 10; 
Run;
Reeza
Super User

Proc univariate provides some more options for various percentiles. 

mbarkauskas
Calcite | Level 5

Thanks a lot for your input. It helped.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 22733 views
  • 2 likes
  • 4 in conversation