Hello,
I have multivariable data set and I need to subset my data that values of all variables be larger than 75th percentile at each column. I appreciate for any help.
***********;
/*Calculate 75 percentile*/
proc means data=mydata noprint;
var varb varc varg varh vark ;
output out=p75dataset P75= / autoname;
run;
proc print data=p75dataset;run;
/*Store 75 percentile in a macro variable*/
data _null_;
set p75dataset;
call symputx('p75Mvalue', autoname);
run;
/*Find the subset that values of variables are larger than 75 percentiles*/
data subset;
set mydata ;
array var{5} varb varc varg varh vark;
do i = 1 to 5;
where var(i)>=&p75Mvalue ;
end;
run;
Let's work backwards:
Your last step has this code:
/*Find the subset that values of variables are larger than 75 percentiles*/ data subset; set mydata ; array var{5} varb varc varg varh vark; do i = 1 to 5; where var(i)>=&p75Mvalue ; end; run;
You have a where statement with a component specifying an array element. This has two problems:
So, if you want to use macro variables, you probably want something like
where varb >= &varb_p75 and varc >= &varc_p75 and
varg >= &varg_p75 and varh >= &varh_p75 and
vark >= &vark_p75 ;
That, in turn, means you have to modify your middle step to create those 5 macrovars. It currently only creates (and repeatedly overwrites) a single macrovar p75mvalue.
So take a look at the output of the proc means, and see how you can loop over the 5 values for the 75th percentiles, writing a single distinctly-named macrovar in each iteration.
BTW, you could avoid the use of macrovars entirely if you choose to use an IF statement (instead of where) in the last data step. You would then need only the proc means and the DATA SUBSET step with an additional "IF _N_=1 then SET P75DATASET;" statement..
Let's work backwards:
Your last step has this code:
/*Find the subset that values of variables are larger than 75 percentiles*/ data subset; set mydata ; array var{5} varb varc varg varh vark; do i = 1 to 5; where var(i)>=&p75Mvalue ; end; run;
You have a where statement with a component specifying an array element. This has two problems:
So, if you want to use macro variables, you probably want something like
where varb >= &varb_p75 and varc >= &varc_p75 and
varg >= &varg_p75 and varh >= &varh_p75 and
vark >= &vark_p75 ;
That, in turn, means you have to modify your middle step to create those 5 macrovars. It currently only creates (and repeatedly overwrites) a single macrovar p75mvalue.
So take a look at the output of the proc means, and see how you can loop over the 5 values for the 75th percentiles, writing a single distinctly-named macrovar in each iteration.
BTW, you could avoid the use of macrovars entirely if you choose to use an IF statement (instead of where) in the last data step. You would then need only the proc means and the DATA SUBSET step with an additional "IF _N_=1 then SET P75DATASET;" statement..
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.