Hi there,
I do have a dataset:
VAR_1 VAR_2 VAR_3 IND_1 IND_2 IND_N
A XX Z 23 400 2
A XX Z 34 405 7
A XX F 56 328 8
A YY F 11 120 4
A YY F 17 185 9
A YY Z 18 280 9
A YY Z 18 280 9
B XX Z 26 394 3
B XX Z 37 399 8
B XX F 59 322 9
B YY F 14 114 5
B YY Z 21 274 10
B YY Z 21 274 10
Now, in each of the groups constructed from VAR1, VAR2, VAR3: (A XX Z), (A YY F), (A YY Z) etc I need to calculate percentiles in each of the indicators IND_1, IND_2 etc. The thing is that number of indicators is not fixed and can change: IND_1...IND_N.
I understand that first step is to construct all distinct groups from VAR1,VAR2,VAR3 and then in each of these groups calcultae percentiles for each of the indicators IND_1, IND_2...IND_N.
Appreciate if you could you help me to build a code for that. Thanks a lot.
Hi @mbarkauskas,
@mbarkauskas wrote:
But it performs percentile calculation in each distinct group of var1 var2 var3 ONLY for paramater ind_1. The other parameters ind_2 and ind_n are not processed.
This happens because in your PctlPre= option only one prefix ("PERC_") is listed, but it should be a list of as many distinct prefixes as there are variables in variable list ind_:.
You could create such a list of prefixes as follows:
proc sql noprint;
select cats(name,'p') into :pctnames separated by ' '
from dictionary.columns
where libname='WORK' & memname='QUERY_FOR_PERC' & upcase(name) eqt 'IND_'
order by varnum;
quit;
Macro variable PCTNAMES will then contain one prefix for each IND_i variable, where the prefix is just the variable name plus a 'p' suffix (but you can modify this by changing cats(name,'p') to something else), e.g. IND_7p for variable IND_7.
Use this macro variable in the OUTPUT statement of PROC UNIVARIATE:
PctlPre=&pctnames
(instead of PctlPre=PERC_). The variables containing the percentiles will then be named, e.g., IND_7p51 (for the 51th percentile of variable IND_7).
Also, I'm wondering if you really want to calculate the 1st, 11th, 21th, ..., 91th percentiles (which is what PctlPts=1 to 100 by 10 generates). PctlPts=0 to 100 by 10 would seem more common to me (i.e. minimum, 10th, 20th, ..., 90th percentile, maximum).
It is a big advantage fro SAS. You don't need to know how many variables there are in advance .
proc summary data=have;
by var_1 var_2 var_3;
var ind_: ;
output out=want p25= p75= /autoname;
run;
proc print;run;
Thanks. I am currently using the code:
Proc Univariate Data=work.query_for_perc NoPrint; by var_1 var_2 var_3; var ind_:; Output Out=Result2 PctlPre=PERC_ PctlPts=1 to 100 by 10; Run;
But it performs percentile calculation in each distinct group of var1 var2 var3 ONLY for paramater ind_1. The other parameters ind_2 and ind_n are not processed.
Hi @mbarkauskas,
@mbarkauskas wrote:
But it performs percentile calculation in each distinct group of var1 var2 var3 ONLY for paramater ind_1. The other parameters ind_2 and ind_n are not processed.
This happens because in your PctlPre= option only one prefix ("PERC_") is listed, but it should be a list of as many distinct prefixes as there are variables in variable list ind_:.
You could create such a list of prefixes as follows:
proc sql noprint;
select cats(name,'p') into :pctnames separated by ' '
from dictionary.columns
where libname='WORK' & memname='QUERY_FOR_PERC' & upcase(name) eqt 'IND_'
order by varnum;
quit;
Macro variable PCTNAMES will then contain one prefix for each IND_i variable, where the prefix is just the variable name plus a 'p' suffix (but you can modify this by changing cats(name,'p') to something else), e.g. IND_7p for variable IND_7.
Use this macro variable in the OUTPUT statement of PROC UNIVARIATE:
PctlPre=&pctnames
(instead of PctlPre=PERC_). The variables containing the percentiles will then be named, e.g., IND_7p51 (for the 51th percentile of variable IND_7).
Also, I'm wondering if you really want to calculate the 1st, 11th, 21th, ..., 91th percentiles (which is what PctlPts=1 to 100 by 10 generates). PctlPts=0 to 100 by 10 would seem more common to me (i.e. minimum, 10th, 20th, ..., 90th percentile, maximum).
if you have two more variable you need specify the corresponding prefix in proc univariate :
proc sort data=sashelp.class out=class;by sex;run;
Proc Univariate Data=class noprint;
by sex;
var weight height;
Output Out=Result2 PctlPre=weight_ height_ PctlPts=1 to 100 by 10;
Run;
Proc univariate provides some more options for various percentiles.
Thanks a lot for your input. It helped.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.