Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Subset observations based on standard deviation

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-10-2013 04:02 PM

Given a single column data set, continuous, 10,000 rows.

Create a sub-set that meets the following conditions:

--- Elements greater than one standard deviation above the mean.

--- But less than two standard deviations above the mean.

Prefer to use basic SAS functions.

data nicholas.between_1sd2sd;

set nicholas.combined;

where...

;

run;

Thanks!

Accepted Solutions

Solution

01-10-2013
07:54 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-10-2013 07:54 PM

is the example helpful?

proc sql noprint;;

select mean(age) into : mean from sashelp.class;

select std(age) into :std from sashelp.class;

quit;

data class;

set sashelp.class;

where age >&mean+&std and age<&mean+2*&std;

proc print;run;

Obs Name Sex Age Height Weight

1 Janet F 15 62.5 112.5

2 Mary F 15 66.5 112.0

3 Philip M 16 72.0 150.0

4 Ronald M 15 67.0 133.0

5 William M 15 66.5 112.0

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-10-2013 04:07 PM

This is almost identical to your previous question with percentiles instead of standard deviation.

You can use the same code, except replace it with standard deviation.

The datastep goes through data one row at a time so you need to have the standard deviation pre-calculated before you can use the where clause.

A SQL procedure operates on the entire column at once and can do it in one step, or at least one coded step, though it may take multiple steps behind the scenes.

proc sql;

create table want as

select weight, std(weight) as std_weight, avg(weight) as avg_weight

from sashelp.class

having abs(weight-calculated avg_weight)/calculated std_weight between 1 and 2;

quit;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-10-2013 04:16 PM

Hmmm. So the standard deviation function is a row-only one as well?

Bummer.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-10-2013 04:18 PM

The data step operates on a row by row basis. All functions are row only, procedures operate across a dataset.

You can however use SQL functions for this one (see above). SQL doesn't have median/order statistics in this implementation.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-10-2013 04:22 PM

Reeza, hate to bother, but would you please put the SQL code for achieving the above task? Maybe others too will find it here and appreciate having access to it.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-10-2013 04:28 PM

See above

Edited the original response.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-10-2013 07:08 PM

Sorta fits the bill, Reeza. Problem would be we want only the up side here:

Create a sub-set of data set that meets the following conditions:

--- Elements **greater than** one standard deviation above the mean.

--- But less than two standard deviations **above** the mean.

Your code give both sides of the curve.

Additionally, and here's the larger problem, while we want to focus on one particular column (50501'n), we want to **retain all the additional columns in the original data set** (nicholas._21603_). (Sorry for not making this clear at the outset.)

Here's the code I used so far. Please edit it as opposed to your example.

proc sql;

create table nicholas.between_plus_1sd2sd as

select '50501'n, std('50501'n) as std_50501, avg('50501'n) as avg_50501

from nicholas._21603_

having abs('50501'n - calculated avg_50501) / calculated std_50501 between 1 and 2;

quit;

Solution

01-10-2013
07:54 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-10-2013 07:54 PM

is the example helpful?

proc sql noprint;;

select mean(age) into : mean from sashelp.class;

select std(age) into :std from sashelp.class;

quit;

data class;

set sashelp.class;

where age >&mean+&std and age<&mean+2*&std;

proc print;run;

Obs Name Sex Age Height Weight

1 Janet F 15 62.5 112.5

2 Mary F 15 66.5 112.0

3 Philip M 16 72.0 150.0

4 Ronald M 15 67.0 133.0

5 William M 15 66.5 112.0

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-11-2013 03:02 AM

Nice form, Linlin. Thanks much!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-10-2013 08:27 PM

Adding in an * that will bring in all of the variables, rather than just your variable.

You can modify the where clause to get the upside (above only), your basically looking for z scores between 1 and 2 .

proc sql;

create table want as

select *, std(weight) as std_weight, avg(weight) as avg_weight

from sashelp.class

having abs(weight-calculated avg_weight)/calculated std_weight between 1 and 2;

quit