turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Data Management
- /
- Forum
- /
- How to read a dataset for a value range and output...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-08-2013 05:55 PM

Hello,

Good afternoon-- I am trying to write a program to automate outlier detection. I need to produce a list of values > 3 or < -3 along with its variable name and obs number.

I used proc standard to standardize my variables:

data=work.prepstandard;

set work.dataset (drop= id x1 x2 x3 x4 x5);

run;

PROC STANDARD DATA=work.prepstandard MEAN=0 STD=1 OUT=zstandards;

VAR x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16;

run;

Then tried running the following array, but it doesn't work:

DATA work.outliers;

SET zstandards;

ARRAY x

DO i=1 TO DIM(x);

IF x* > 3 or x <-3 THEN DO;*

obsNum= _N_;

OUTPUT;

END;

END;

run;

Did I do something wrong in the array, or is this the wrong way to go about it? Thanks for helping me out... I sincerely appreciate it!

-Charles

Accepted Solutions

Solution

10-08-2013
06:15 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-08-2013 06:15 PM

It should work, but you don't explain how it doesn't work so I can't comment beyond that.

However, it won't identify the variable that is the outlier and if there are multiple outliers in a specific observation, though I suppose if you're automating then you don't care too much about that.

Here's a sample that does what you're asking using SASHELP.CARS. I didn't drop the lead variables though you could easily.

If you have SAS/STAT licensed you can also look into proc stdize.

proc standard data=sashelp.cars mean=0 std=1 out=zstandards;

var msrp--length;

run;

data outliers;

set zstandards;

array x(*) _numeric_;

do i=1 to dim(x);

if abs(x(i))-3>0 then do;

obsnum=_n_;

variable=vname(x(i));

value=x(i);

output;

end;

end;

keep obsnum variable value;

run;

All Replies

Solution

10-08-2013
06:15 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-08-2013 06:15 PM

It should work, but you don't explain how it doesn't work so I can't comment beyond that.

However, it won't identify the variable that is the outlier and if there are multiple outliers in a specific observation, though I suppose if you're automating then you don't care too much about that.

Here's a sample that does what you're asking using SASHELP.CARS. I didn't drop the lead variables though you could easily.

If you have SAS/STAT licensed you can also look into proc stdize.

proc standard data=sashelp.cars mean=0 std=1 out=zstandards;

var msrp--length;

run;

data outliers;

set zstandards;

array x(*) _numeric_;

do i=1 to dim(x);

if abs(x(i))-3>0 then do;

obsnum=_n_;

variable=vname(x(i));

value=x(i);

output;

end;

end;

keep obsnum variable value;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-08-2013 06:46 PM

It worked, thank you very much. I really like the way you set up the IF-THEN statement with the abs() to capture both the negative and positive values-- nicely done!!

I appreciate your help!

-Charles