Can I use the PCTL function this way in a DATA step? The answer appears to be no...I'm trying to remove outliers to look at the distribution of a variable w/o them. My objective in the syntax below is to remove values of age GE the 75th percentile and LE the 25th percentile, so that I can look at the distribution w/o outliers.
data test_2; set data.SMART_a1_a5_final; do timepoint = "A1", "A2", "A3", "A4", "A5"; IF AGE GE PCTL(75,age) THEN DELETE; IF AGE LE PCTL(25,age) THEN DELETE; end; RUN;
The functions work on values within the row, so max finds the max of f1 to f4 for each row. Whereas proc means finds the max of f1, the max of f2, etc.
See a quick example below and compare the output.
data have;
input f1-f4;
cards;
1 4 5 6
3 7 9 10
1 3 12 3
2 34 3 5
;
run;
data want;
set have;
max_row=max(of f1-f4);
pctl_row=pctl(25, of f1-f4);
run;
proc means data=want pct25 max;
var f1-f4;
run;
Functions in a dataset operate on the values within a specific row.
If you want values for a column you'll need to use a proc to get the percentiles first and then filter them.
Proc Univariate and Proc means can calculate percentiles, univariate has more percentiles available, while proc means does the most common values.
@Reeza Thanks. Your explanation makes sense. I could use PROC UNIVARIATE and then output the 3Q and 1Q into new variables. I think that's what you are suggesting...
Could you please explain this is a little more. I'm not sure I totally follow.
<Functions in a dataset operate on the values within a specific row. >
As always, thank you!
The functions work on values within the row, so max finds the max of f1 to f4 for each row. Whereas proc means finds the max of f1, the max of f2, etc.
See a quick example below and compare the output.
data have;
input f1-f4;
cards;
1 4 5 6
3 7 9 10
1 3 12 3
2 34 3 5
;
run;
data want;
set have;
max_row=max(of f1-f4);
pctl_row=pctl(25, of f1-f4);
run;
proc means data=want pct25 max;
var f1-f4;
run;
@Reeza I had no idea that functions only worked across row. Wow. Thanks!
You sound like you came from SQL world. Yes, indeed, most of the data step functions work on ONE obs at a time, some times some functions may appear not, however, they still DO. This is due the foundamental data processing structure in data step, aka, PDV (program data vector).
@Reeza Can you use a function w/o creating a new variable?
I was playing around w/ this syntax w/o success (i.e. "test" is equal to "test_2");
data test; input age; datalines; 23 45 65 43 12 21 34 55 ; run; data test_2; SET test; WHERE age LE PCTL(75,age); WHERE SAME AND age GE PCTL(25,age); RUN;
You are still attempting to use a the function suboptimally. ANY percentile of a given single value is that value. So since you use GE or LE then you are doing this comparison:
Where age le age;
Where age ge age;
See this code:
data _null_;
age = 25;
do pctl= 1, 10, 25,50;
result = pctl(pctl,age);
put 'Percentile'+1 pctl "of " age= "is " result;
end;
run;
But yes, you can use the function without a new variable. I often use code like;
If x is > max(var1,var2,var3) then do ...
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.