04-16-2015 05:12 PM
When looking at diagnostic plots ID'ing outliers within a data set, it seems the plots themselves do not label what data point is causing the outlier on the chart.
Is there a way to have a cook's distance plot (from SAS EG regression output) show which data point correspond to the outliers?
I'd like to be able easily identify which rows in my data set should be considered for removal.
04-16-2015 05:22 PM
Which version of SAS are you running?
A graphic approach to outliers may require a different plotting procedure to give you more control but the options have changed quite quickly with the latest releases.
An alternate approach might be to output the input data with the COOKD statistic or estimated values and/or residuals and look at the larger residuals. Which could be done with proc univariate which will show the top and bottom largest values and the records they come from.
There may also be other options depending WHICH regression procedure you are running.
04-16-2015 05:34 PM
I'm using SAS Enterprise Guide 6.1, I'm just going to "Tasks", "Linear Regress", and observing the resulting diagnostic plots from the regression results. In looking at the Cook's Distance plot, I see an outlier.... but I'd like to know which row within the data table it corresponds to.
04-16-2015 07:15 PM
I don't speak EG much so can't list explicit buttons to push but the regression task should let you create an output data set with the cookd statistic. Send that dataset to proc univariate looking at cookd. Univariate as part of the general output should show a list of the 5 largest and smalled values for cookd WITH the observation number in the data set.