BookmarkSubscribeRSS Feed
_maldini_
Barite | Level 11

I am trying to create a dataset that contains a set of variables where values greater than or equal to the third quartile and less than or equal to the first quartiles are deleted. I'm inexperienced using arrays. The code below is just my best guess. No observations are written to the new dataset using this code. Any help would be appreciated!

 

data have;
input age weight height Q3_age Q1_age Q3_weight Q1_weight Q3_height Q1_height
;
datalines;
23 165 72 55 44 222 145 72 65
45 222 65 55 44 222 145 72 65
57 165 45 55 44 222 145 72 65
44 156 80 55 44 222 145 72 65
44 109 99 55 44 222 145 72 65
55 118 37 55 44 222 145 72 65
66 145 66 55 44 222 145 72 65
76 444 56 55 44 222 145 72 65
99 234 76 55 44 222 145 72 65
;
run; DATA want; SET have; ARRAY array_one (*) age weight height ; ARRAY array_two (*) Q3_age Q1_age
Q3_weight Q1_weight Q3_height Q1_height ; DO i=1 to DIM(array_one); IF array_one(i)GE array_two(i) THEN DELETE; IF array_one(i)LE array_two(i) THEN DELETE; END; DROP i; RUN;
9 REPLIES 9
ballardw
Super User

where are the values for the Q variables supposed to be? Is that what you are asking for help on?

Since they are missing in your data then in this line of code:

IF array_one(i) GE array_two(i) THEN DELETE;

 

Array_one is ALWAYS > missing so everything is deleted.

 

Since you may be looking to join every record in Have to a summary dataset this might be close to what you want

proc summary data=have;
   var _numeric_;
   output out=HaveSum q1= q3= /autoname;
run; 
	
proc sql;
   create table want as
   select a.*
   from have as a, havesum as b
   where (age between age_q1 and age_q3)
      and (weight between weight_q1 and weight_q3)
      and (height between height_q1 and height_q3)
   ;
quit;
_maldini_
Barite | Level 11

@ballardw 

 

<where are the values for the Q variables supposed to be?>

 

Sorry. I just updated the data...Thanks for the help.

 

 

ballardw
Super User

Why do you want to compare Weight (the second element of Array_one) to the Q1_age (the second element array_two)?

Unless you have a LARGE number of variables like this arrays aren't really helpful and the logic is going to be abysmal.

 

_maldini_
Barite | Level 11

@ballardw 

 

<Why do you want to compare Weight (the second element of Array_one) to the Q1_age (the second element array_two)>

 

I don't. I just wasn't sure how the array would work. I changed the order of the elements. 

 

I have many more variables. The data included in the original post was just for example purposes.

PGStats
Opal | Level 21

If your goal is to compute trimmed means, standard errors, or confidence intervals, consider using proc univariate with option trimmed= 

PG
_maldini_
Barite | Level 11

@PGStats I'm not sure I understand the documentation on this function. If I wanted means w/o outliers (e.g. 1.5 x the Q3 and 1.5 x the Q1, how would I specify these parameters in the option? 

 

TRIMMED=values <(<TYPE=keyword> <ALPHA=>)>

 


TRIM=values <(<TYPE=keyword> <ALPHA=>)>
requests a table of trimmed means, where value specifies the number or the proportion of observations that PROC UNIVARIATE trims. If the value is the number of trimmed observations, must be between 0 and half the number of nonmissing observations. If value is a proportion between 0 and 1/2, the number of observations that PROC UNIVARIATE trims is the smallest integer that is greater than or equal to , where is the number of observations. To include confidence limits for the mean and the Student’s test in the table, you must use the default value of VARDEF=, which is DF. For details concerning the computation of trimmed means, see the section Trimmed Means. The TRIMMED= option does not apply if you use a WEIGHT statement.

ballardw
Super User

Trimmed= 0.05 would be a 5% trim, the Type controls which end (or both) to trim.

Trimmed= 0.25 type=twosided

I think is what you may be looking for.

PGStats
Opal | Level 21

@ballardw, type= option controls the type of confidence interval that is computed, not the trimming. Trimming is always done symmetrically. I had to do a little test to confirm this, as the doc is ambiguous.

PG
ballardw
Super User

@PGStats wrote:

@ballardw, type= option controls the type of confidence interval that is computed, not the trimming. Trimming is always done symmetrically. I had to do a little test to confirm this, as the doc is ambiguous.


Thanks for the heads up. I don't use arbitrarily trimmed data and the documentation got me. I've always been skeptical of trimming since the first stats program I used in class where the professor also wrote the program (that we had to buy...) but that professor loved trimming data.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1749 views
  • 4 likes
  • 3 in conversation