BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
maroulator
Obsidian | Level 7

To whom it may concern,

 

I am running the following snippet of code and I am trying to capture outliers using the Pearson Residual, Hat Matrix Diagonal, and DfBeta values. This a very straightforward task if my dataset is less than 10,000 observations. My problem, however, is that once I start working on a dataset with observations in the millions, sifting through my output file becomes an extremely cumbersome process; this is because my output file lists out every single one of the observations in the dataset along with the associated leverage/influence metrics and graphs. Is anyone aware of a way to have my output only display the observations that have a DfBeta, or a Person Residual, or a Hat Matrix diagonal value above a certain level?

 

 

proc logistic data=Dataset descending;

model Y=var1 var2 var3 var4/ plcl plrl waldcl waldrl

                                                lackfit rsq

                                                influence iplots

                                                itprint;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Put them into a data set instead and then filter as you would any other source.

 

Check the OUTPUT statement and the options available to capture the data.

 


@maroulator wrote:

To whom it may concern,

 

I am running the following snippet of code and I am trying to capture outliers using the Pearson Residual, Hat Matrix Diagonal, and DfBeta values. This a very straightforward task if my dataset is less than 10,000 observations. My problem, however, is that once I start working on a dataset with observations in the millions, sifting through my output file becomes an extremely cumbersome process; this is because my output file lists out every single one of the observations in the dataset along with the associated leverage/influence metrics and graphs. Is anyone aware of a way to have my output only display the observations that have a DfBeta, or a Person Residual, or a Hat Matrix diagonal value above a certain level?

 

 

proc logistic data=Dataset descending;

model Y=var1 var2 var3 var4/ plcl plrl waldcl waldrl

                                                lackfit rsq

                                                influence iplots

                                                itprint;

run;


 

View solution in original post

2 REPLIES 2
Reeza
Super User

Put them into a data set instead and then filter as you would any other source.

 

Check the OUTPUT statement and the options available to capture the data.

 


@maroulator wrote:

To whom it may concern,

 

I am running the following snippet of code and I am trying to capture outliers using the Pearson Residual, Hat Matrix Diagonal, and DfBeta values. This a very straightforward task if my dataset is less than 10,000 observations. My problem, however, is that once I start working on a dataset with observations in the millions, sifting through my output file becomes an extremely cumbersome process; this is because my output file lists out every single one of the observations in the dataset along with the associated leverage/influence metrics and graphs. Is anyone aware of a way to have my output only display the observations that have a DfBeta, or a Person Residual, or a Hat Matrix diagonal value above a certain level?

 

 

proc logistic data=Dataset descending;

model Y=var1 var2 var3 var4/ plcl plrl waldcl waldrl

                                                lackfit rsq

                                                influence iplots

                                                itprint;

run;


 

ballardw
Super User

Adding ods output to the body of the Proc logistic code:

 

proc logistic data=Dataset descending;
model Y=var1 var2 var3 var4/ plcl plrl waldcl waldrl
                           lackfit rsq
                           influence iplots
                           itprint;
ods output influence=myinfluencedatasetname;
run;

 

 

will place the content of the influence results table into a dataset.

Filter or sort as interested.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 615 views
  • 0 likes
  • 3 in conversation