BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
_maldini_
Barite | Level 11

Can I use the PCTL function this way in a DATA step? The answer appears to be no...I'm trying to remove outliers to look at the distribution of a variable w/o them. My objective in the syntax below is to remove values of age GE the 75th percentile and LE the 25th percentile, so that I can look at the distribution w/o outliers. 

 

	data test_2; 
	set data.SMART_a1_a5_final;
	
        do timepoint = "A1", "A2", "A3", "A4", "A5";
 	  IF AGE GE PCTL(75,age) THEN DELETE; 
 	  IF AGE LE PCTL(25,age) THEN DELETE; 
	end;
	RUN;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

The functions work on values within the row, so max finds the max of f1 to f4 for each row. Whereas proc means finds the max of f1, the max of f2, etc. 

 

See a quick example below and compare the output. 

 

data have;
input f1-f4;
cards;
1 4 5 6
3 7 9 10
1 3 12 3
2 34 3 5
;
run;

data want;
set have;
max_row=max(of f1-f4);
pctl_row=pctl(25, of f1-f4);
run;

proc means data=want pct25 max;
var f1-f4;
run;

View solution in original post

9 REPLIES 9
Haikuo
Onyx | Level 15
Age always equal to pctl(75,age) or pctl(25,age). What do you want to achieve? There should be more than one elements in the place of 'age' if you are looking at 'percentile', otherwise, it always returns the 'age'. Please refer to SAS docs for details.
Haikuo
Onyx | Level 15
Please show some data. That is usually the best way to illustrate your problem.
Reeza
Super User

Functions in a dataset operate on the values within a specific row. 

 

If you want values for a column you'll need to use a proc to get the percentiles first and then filter them. 

Proc Univariate and Proc means can calculate percentiles, univariate has more percentiles available, while proc means does the most common values.

_maldini_
Barite | Level 11

@Reeza Thanks. Your explanation makes sense. I could use PROC UNIVARIATE and then output the 3Q and 1Q into new variables. I think that's what you are suggesting...

 

Could you please explain this is a little more. I'm not sure I totally follow.

 

<Functions in a dataset operate on the values within a specific row. >

 

As always, thank you!

Reeza
Super User

The functions work on values within the row, so max finds the max of f1 to f4 for each row. Whereas proc means finds the max of f1, the max of f2, etc. 

 

See a quick example below and compare the output. 

 

data have;
input f1-f4;
cards;
1 4 5 6
3 7 9 10
1 3 12 3
2 34 3 5
;
run;

data want;
set have;
max_row=max(of f1-f4);
pctl_row=pctl(25, of f1-f4);
run;

proc means data=want pct25 max;
var f1-f4;
run;
_maldini_
Barite | Level 11

@Reeza I had no idea that functions only worked across row. Wow. Thanks!

Haikuo
Onyx | Level 15

You sound like you came from SQL world. Yes, indeed, most of the data step functions work on ONE obs at a time, some times some functions may appear not, however, they still DO. This is due the foundamental data processing structure in data step, aka, PDV (program data vector).

 

 

_maldini_
Barite | Level 11

@Reeza Can you use a function w/o creating a new variable?

 

I was playing around w/ this syntax w/o success (i.e. "test" is equal to "test_2");

 

data test;
input age;
datalines;
23
45
65
43
12
21
34
55 ;
run; 

data test_2; 
SET test;    
WHERE age LE PCTL(75,age);    
WHERE SAME AND age GE PCTL(25,age);
RUN;
ballardw
Super User

You are still attempting to use a the function suboptimally. ANY percentile of a given single value is that value. So since you use GE or LE then  you are doing this comparison:

Where age le age;

Where age ge age; 

See this code:

data _null_;
   age = 25;
   do pctl= 1, 10, 25,50;
      result = pctl(pctl,age);
      put 'Percentile'+1 pctl "of " age= "is " result;
   end;
run;

 

 

But yes, you can use the function without a new variable. I often use code like;

 

If x is > max(var1,var2,var3) then do ...

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 5449 views
  • 4 likes
  • 4 in conversation