DATA Step, Macro, Functions and more

Using the PCTL function in a DATA step?

Accepted Solution Solved
Reply
Regular Contributor
Posts: 199
Accepted Solution

Using the PCTL function in a DATA step?

[ Edited ]

Can I use the PCTL function this way in a DATA step? The answer appears to be no...I'm trying to remove outliers to look at the distribution of a variable w/o them. My objective in the syntax below is to remove values of age GE the 75th percentile and LE the 25th percentile, so that I can look at the distribution w/o outliers. 

 

	data test_2; 
	set data.SMART_a1_a5_final;
	
        do timepoint = "A1", "A2", "A3", "A4", "A5";
 	  IF AGE GE PCTL(75,age) THEN DELETE; 
 	  IF AGE LE PCTL(25,age) THEN DELETE; 
	end;
	RUN;

 


Accepted Solutions
Solution
‎05-20-2016 12:25 PM
Super User
Posts: 19,791

Re: Using the PCTL function in a DATA step?

The functions work on values within the row, so max finds the max of f1 to f4 for each row. Whereas proc means finds the max of f1, the max of f2, etc. 

 

See a quick example below and compare the output. 

 

data have;
input f1-f4;
cards;
1 4 5 6
3 7 9 10
1 3 12 3
2 34 3 5
;
run;

data want;
set have;
max_row=max(of f1-f4);
pctl_row=pctl(25, of f1-f4);
run;

proc means data=want pct25 max;
var f1-f4;
run;

View solution in original post


All Replies
Respected Advisor
Posts: 3,156

Re: Using the PCTL function in a DATA step?

Age always equal to pctl(75,age) or pctl(25,age). What do you want to achieve? There should be more than one elements in the place of 'age' if you are looking at 'percentile', otherwise, it always returns the 'age'. Please refer to SAS docs for details.
Respected Advisor
Posts: 3,156

Re: Using the PCTL function in a DATA step?

Please show some data. That is usually the best way to illustrate your problem.
Super User
Posts: 19,791

Re: Using the PCTL function in a DATA step?

Functions in a dataset operate on the values within a specific row. 

 

If you want values for a column you'll need to use a proc to get the percentiles first and then filter them. 

Proc Univariate and Proc means can calculate percentiles, univariate has more percentiles available, while proc means does the most common values.

Regular Contributor
Posts: 199

Re: Using the PCTL function in a DATA step?

@Reeza Thanks. Your explanation makes sense. I could use PROC UNIVARIATE and then output the 3Q and 1Q into new variables. I think that's what you are suggesting...

 

Could you please explain this is a little more. I'm not sure I totally follow.

 

<Functions in a dataset operate on the values within a specific row. >

 

As always, thank you!

Solution
‎05-20-2016 12:25 PM
Super User
Posts: 19,791

Re: Using the PCTL function in a DATA step?

The functions work on values within the row, so max finds the max of f1 to f4 for each row. Whereas proc means finds the max of f1, the max of f2, etc. 

 

See a quick example below and compare the output. 

 

data have;
input f1-f4;
cards;
1 4 5 6
3 7 9 10
1 3 12 3
2 34 3 5
;
run;

data want;
set have;
max_row=max(of f1-f4);
pctl_row=pctl(25, of f1-f4);
run;

proc means data=want pct25 max;
var f1-f4;
run;
Regular Contributor
Posts: 199

Re: Using the PCTL function in a DATA step?

@Reeza I had no idea that functions only worked across row. Wow. Thanks!

Respected Advisor
Posts: 3,156

Re: Using the PCTL function in a DATA step?

You sound like you came from SQL world. Yes, indeed, most of the data step functions work on ONE obs at a time, some times some functions may appear not, however, they still DO. This is due the foundamental data processing structure in data step, aka, PDV (program data vector).

 

 

Regular Contributor
Posts: 199

Re: Using the PCTL function in a DATA step?

@Reeza Can you use a function w/o creating a new variable?

 

I was playing around w/ this syntax w/o success (i.e. "test" is equal to "test_2");

 

data test;
input age;
datalines;
23
45
65
43
12
21
34
55 ;
run; 

data test_2; 
SET test;    
WHERE age LE PCTL(75,age);    
WHERE SAME AND age GE PCTL(25,age);
RUN;
Super User
Posts: 11,343

Re: Using the PCTL function in a DATA step?

You are still attempting to use a the function suboptimally. ANY percentile of a given single value is that value. So since you use GE or LE then  you are doing this comparison:

Where age le age;

Where age ge age; 

See this code:

data _null_;
   age = 25;
   do pctl= 1, 10, 25,50;
      result = pctl(pctl,age);
      put 'Percentile'+1 pctl "of " age= "is " result;
   end;
run;

 

 

But yes, you can use the function without a new variable. I often use code like;

 

If x is > max(var1,var2,var3) then do ...

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 1146 views
  • 4 likes
  • 4 in conversation