turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Multiple Array in SAS

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-13-2016 04:40 PM - edited 05-15-2016 06:12 PM

I am trying to create a dataset that contains a set of variables where values greater than or equal to the third quartile and less than or equal to the first quartiles are deleted. I'm inexperienced using arrays. The code below is just my best guess. No observations are written to the new dataset using this code. Any help would be appreciated!

data have;

input age weight height Q3_age Q1_age Q3_weight Q1_weight Q3_height Q1_height

;

datalines;

23 165 72 55 44 222 145 72 65

45 222 65 55 44 222 145 72 65

57 165 45 55 44 222 145 72 65

44 156 80 55 44 222 145 72 65

44 109 99 55 44 222 145 72 65

55 118 37 55 44 222 145 72 65

66 145 66 55 44 222 145 72 65

76 444 56 55 44 222 145 72 65

99 234 76 55 44 222 145 72 65

;

run; DATA want; SET have; ARRAY array_one (*) age weight height ; ARRAY array_two (*) Q3_age Q1_age

Q3_weight Q1_weight Q3_height Q1_height ; DO i=1 to DIM(array_one); IF array_one(i)GE array_two(i) THEN DELETE; IF array_one(i)LE array_two(i) THEN DELETE; END; DROP i; RUN;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jcorroon

05-13-2016 04:50 PM - edited 05-13-2016 04:58 PM

where are the values for the Q variables supposed to be? Is that what you are asking for help on?

Since they are missing in your data then in this line of code:

IF array_one(i) GE array_two(i) THEN DELETE;

Array_one is ALWAYS > missing so everything is deleted.

Since you may be looking to join every record in Have to a summary dataset this might be close to what you want

```
proc summary data=have;
var _numeric_;
output out=HaveSum q1= q3= /autoname;
run;
proc sql;
create table want as
select a.*
from have as a, havesum as b
where (age between age_q1 and age_q3)
and (weight between weight_q1 and weight_q3)
and (height between height_q1 and height_q3)
;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

05-13-2016 05:34 PM

<where are the values for the Q variables supposed to be?>

Sorry. I just updated the data...Thanks for the help.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jcorroon

05-13-2016 06:25 PM

Why do you want to compare Weight (the second element of Array_one) to the Q1_age (the second element array_two)?

Unless you have a LARGE number of variables like this arrays aren't really helpful and the logic is going to be abysmal.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

05-15-2016 06:13 PM - edited 05-16-2016 11:19 AM

<Why do you want to compare Weight (the second element of Array_one) to the Q1_age (the second element array_two)>

I don't. I just wasn't sure how the array would work. I changed the order of the elements.

I have many more variables. The data included in the original post was just for example purposes.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jcorroon

05-13-2016 05:55 PM

If your goal is to compute trimmed means, standard errors, or confidence intervals, consider using **proc univariate** with option **trimmed= **.

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

05-16-2016 12:42 PM

@PGStats I'm not sure I understand the documentation on this function. If I wanted means w/o outliers (e.g. 1.5 x the Q3 and 1.5 x the Q1, how would I specify these parameters in the option?

TRIMMED=values <(<TYPE=keyword> <ALPHA=>)>

TRIM=values <(<TYPE=keyword> <ALPHA=>)>

requests a table of trimmed means, where value specifies the number or the proportion of observations that PROC UNIVARIATE trims. If the value is the number of trimmed observations, must be between 0 and half the number of nonmissing observations. If value is a proportion between 0 and 1/2, the number of observations that PROC UNIVARIATE trims is the smallest integer that is greater than or equal to , where is the number of observations. To include confidence limits for the mean and the Student’s test in the table, you must use the default value of VARDEF=, which is DF. For details concerning the computation of trimmed means, see the section Trimmed Means. The TRIMMED= option does not apply if you use a WEIGHT statement.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to jcorroon

05-16-2016 12:49 PM

Trimmed= 0.05 would be a 5% trim, the Type controls which end (or both) to trim.

Trimmed= 0.25 type=twosided

I think is what you may be looking for.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ballardw

05-16-2016 02:15 PM

@ballardw, type= option controls the type of confidence interval that is computed, not the trimming. Trimming is always done symmetrically. I had to do a little test to confirm this, as the doc is ambiguous.

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

05-16-2016 06:43 PM

PGStats wrote:

@ballardw, type= option controls the type of confidence interval that is computed, not the trimming. Trimming is always done symmetrically. I had to do a little test to confirm this, as the doc is ambiguous.

Thanks for the heads up. I don't use arbitrarily trimmed data and the documentation got me. I've always been skeptical of trimming since the first stats program I used in class where the professor also wrote the program (that we had to buy...) but that professor loved trimming data.