BookmarkSubscribeRSS Feed
JackZ295
Pyrite | Level 9

Is it possible or does it make sense to calculate an interquartile range across variables? Below is a sample data set: 

 

data one; 
input response_id $ A1 A2 RF SE1 SE2 SE3 SE4 I1 I2 I3 I4 RPS; 
datalines;
1 77 86 81 84 86 83 76 81 84 83 85 82.4 
2 81 86 85 83 79 78 71 80 83 82 86 81.4 
;
run; 
proc print data=one; 
run; 

I want to find the interquartile range of the 11 values of A1-I4. I essentially want to create a new variable next to RPS that represents the interquartile range of the 11 values. For clarity the value for the RPS was generated by adding the values of A1 to I4 and dividing by the number of values (11). Also, I would like the IQR to be expressed as a range from the first to third quartiles rather than difference between the first and third quartiles. Is this possible to do?  

5 REPLIES 5
Reeza
Super User
Sure, use the percentile function.
PCTL.

RPS = mean(of A1--I4);
Range = catx(' - ', pctl(25, of a1-i4), pctl(75, of a1-i4);
ballardw
Super User

@JackZ295 wrote:

Is it possible or does it make sense to calculate an interquartile range across variables? Below is a sample data set: 

 

data one; 
input response_id $ A1 A2 RF SE1 SE2 SE3 SE4 I1 I2 I3 I4 RPS; 
datalines;
1 77 86 81 84 86 83 76 81 84 83 85 82.4 
2 81 86 85 83 79 78 71 80 83 82 86 81.4 
;
run; 
proc print data=one; 
run; 

I want to find the interquartile range of the 11 values of A1-I4. I essentially want to create a new variable next to RPS that represents the interquartile range of the 11 values. For clarity the value for the RPS was generated by adding the values of A1 to I4 and dividing by the number of values (11). Also, I would like the IQR to be expressed as a range from the first to third quartiles rather than difference between the first and third quartiles. Is this possible to do?  


You should provide an actual example of what you want for that blue text. IQR is DEFINED as a difference.

You can get the percentiles with the PCTL function but you don't get a single number for the range.

data one; 
   input response_id $ A1 A2 RF SE1 SE2 SE3 SE4 I1 I2 I3 I4 RPS; 
   iqr = iqr( A1, A2, RF, SE1, SE2, SE3, SE4, I1, I2, I3, I4);
   p25 = pctl(25, A1, A2, RF, SE1, SE2, SE3, SE4, I1, I2, I3, I4);
   p75 = pctl(75, A1, A2, RF, SE1, SE2, SE3, SE4, I1, I2, I3, I4);
datalines;
1 77 86 81 84 86 83 76 81 84 83 85 82.4 
2 81 86 85 83 79 78 71 80 83 82 86 81.4 
;
run;

If you want to bodge p25 and p75 into a single value you will need to create a character variable as desired.

PaigeMiller
Diamond | Level 26

@ballardw writes:

 

iqr = iqr( A1, A2, RF, SE1, SE2, SE3, SE4, I1, I2, I3, I4);

or even simpler if appropriate

 

iqr = iqr( of A1--I4);

 

--
Paige Miller
PaigeMiller
Diamond | Level 26

@JackZ295 wrote:

Is it possible or does it make sense to calculate an interquartile range across variables? Below is a sample data set: 

 

data one; 
input response_id $ A1 A2 RF SE1 SE2 SE3 SE4 I1 I2 I3 I4 RPS; 
datalines;
1 77 86 81 84 86 83 76 81 84 83 85 82.4 
2 81 86 85 83 79 78 71 80 83 82 86 81.4 
;
run; 
proc print data=one; 
run; 

I want to find the interquartile range of the 11 values of A1-I4. I essentially want to create a new variable next to RPS that represents the interquartile range of the 11 values. For clarity the value for the RPS was generated by adding the values of A1 to I4 and dividing by the number of values (11). Also, I would like the IQR to be expressed as a range from the first to third quartiles rather than difference between the first and third quartiles. Is this possible to do?  


So the answer is, YES it is possible and others have provided code. But just because it is possible, that does not mean you SHOULD do this. The real question is, does it make any sense to do this given the source and meaning of these variables? Only you understand the problem well enough to know what these values represent, and whether it makes sense to find statistics across all of these variables. In general, I would be skeptical of doing such a calculation across different variables, without clear explanation and justification.

--
Paige Miller
Reeza
Super User
Just a note that 11 observations will not generate stable percentiles so interpretation is questionable. You can calculate it but your N's are too small to generalize or have any confidence in the data.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 4060 views
  • 3 likes
  • 4 in conversation