Re: Multiple Array in SAS

_maldini_ · Posted 05-13-2016 04:40 PM

I am trying to create a dataset that contains a set of variables where values greater than or equal to the third quartile and less than or equal to the first quartiles are deleted. I'm inexperienced using arrays. The code below is just my best guess. No observations are written to the new dataset using this code. Any help would be appreciated!

data have;
 input age weight height Q3_age Q1_age Q3_weight Q1_weight Q3_height Q1_height
 ;
 datalines;
 23 165 72 55 44 222 145 72 65
 45 222 65 55 44 222 145 72 65
 57 165 45 55 44 222 145 72 65
 44 156 80 55 44 222 145 72 65
 44 109 99 55 44 222 145 72 65
 55 118 37 55 44 222 145 72 65
 66 145 66 55 44 222 145 72 65
 76 444 56 55 44 222 145 72 65
 99 234 76 55 44 222 145 72 65
 ;
 run;
	
	DATA want; 
	SET have;
	
	ARRAY array_one (*)
	age
	weight
	height
	;
	
	ARRAY array_two (*) 
	Q3_age 
	Q1_age
        Q3_weight
	Q1_weight 
	Q3_height 
	Q1_height 
	;
	
	DO i=1 to DIM(array_one);
	IF array_one(i)GE array_two(i) THEN DELETE;
	IF array_one(i)LE array_two(i) THEN DELETE;
	END;
	DROP i;
	RUN;

ballardw · Posted 05-13-2016 04:50 PM

where are the values for the Q variables supposed to be? Is that what you are asking for help on?

Since they are missing in your data then in this line of code:

IF array_one(i) GE array_two(i) THEN DELETE;

Array_one is ALWAYS > missing so everything is deleted.

Since you may be looking to join every record in Have to a summary dataset this might be close to what you want

proc summary data=have;
   var _numeric_;
   output out=HaveSum q1= q3= /autoname;
run; 
	
proc sql;
   create table want as
   select a.*
   from have as a, havesum as b
   where (age between age_q1 and age_q3)
      and (weight between weight_q1 and weight_q3)
      and (height between height_q1 and height_q3)
   ;
quit;

_maldini_ · Posted 05-13-2016 05:34 PM

@ballardw

Sorry. I just updated the data...Thanks for the help.

ballardw · Posted 05-13-2016 06:25 PM

Why do you want to compare Weight (the second element of Array_one) to the Q1_age (the second element array_two)?

Unless you have a LARGE number of variables like this arrays aren't really helpful and the logic is going to be abysmal.

_maldini_ · Posted 05-15-2016 06:13 PM

@ballardw

I don't. I just wasn't sure how the array would work. I changed the order of the elements.

I have many more variables. The data included in the original post was just for example purposes.

PGStats · Posted 05-13-2016 05:55 PM

If your goal is to compute trimmed means, standard errors, or confidence intervals, consider using proc univariate with option trimmed= .

PG

_maldini_ · Posted 05-16-2016 12:42 PM

@PGStats I'm not sure I understand the documentation on this function. If I wanted means w/o outliers (e.g. 1.5 x the Q3 and 1.5 x the Q1, how would I specify these parameters in the option?

TRIMMED=values <(<TYPE=keyword> <ALPHA=>)>

TRIM=values <(<TYPE=keyword> <ALPHA=>)>
requests a table of trimmed means, where value specifies the number or the proportion of observations that PROC UNIVARIATE trims. If the value is the number of trimmed observations, must be between 0 and half the number of nonmissing observations. If value is a proportion between 0 and 1/2, the number of observations that PROC UNIVARIATE trims is the smallest integer that is greater than or equal to , where is the number of observations. To include confidence limits for the mean and the Student’s test in the table, you must use the default value of VARDEF=, which is DF. For details concerning the computation of trimmed means, see the section Trimmed Means. The TRIMMED= option does not apply if you use a WEIGHT statement.

ballardw · Posted 05-16-2016 12:49 PM

Trimmed= 0.05 would be a 5% trim, the Type controls which end (or both) to trim.

Trimmed= 0.25 type=twosided

I think is what you may be looking for.

PGStats · Posted 05-16-2016 02:15 PM

@ballardw, type= option controls the type of confidence interval that is computed, not the trimming. Trimming is always done symmetrically. I had to do a little test to confirm this, as the doc is ambiguous.

PG

ballardw · Posted 05-16-2016 06:43 PM

@PGStats wrote:

@ballardw, type= option controls the type of confidence interval that is computed, not the trimming. Trimming is always done symmetrically. I had to do a little test to confirm this, as the doc is ambiguous.

Thanks for the heads up. I don't use arbitrarily trimmed data and the documentation got me. I've always been skeptical of trimming since the first stats program I used in class where the professor also wrote the program (that we had to buy...) but that professor loved trimming data.

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away