BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
maroulator
Obsidian | Level 7

To whom it may concern,

 

I am running the following snippet of code and I am trying to capture outliers using the Pearson Residual, Hat Matrix Diagonal, and DfBeta values. This a very straightforward task if my dataset is less than 10,000 observations. My problem, however, is that once I start working on a dataset with observations in the millions, sifting through my output file becomes an extremely cumbersome process; this is because my output file lists out every single one of the observations in the dataset along with the associated leverage/influence metrics and graphs. Is anyone aware of a way to have my output only display the observations that have a DfBeta, or a Person Residual, or a Hat Matrix diagonal value above a certain level?

 

 

proc logistic data=Dataset descending;

model Y=var1 var2 var3 var4/ plcl plrl waldcl waldrl

                                                lackfit rsq

                                                influence iplots

                                                itprint;

run;

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Put them into a data set instead and then filter as you would any other source.

 

Check the OUTPUT statement and the options available to capture the data.

 


@maroulator wrote:

To whom it may concern,

 

I am running the following snippet of code and I am trying to capture outliers using the Pearson Residual, Hat Matrix Diagonal, and DfBeta values. This a very straightforward task if my dataset is less than 10,000 observations. My problem, however, is that once I start working on a dataset with observations in the millions, sifting through my output file becomes an extremely cumbersome process; this is because my output file lists out every single one of the observations in the dataset along with the associated leverage/influence metrics and graphs. Is anyone aware of a way to have my output only display the observations that have a DfBeta, or a Person Residual, or a Hat Matrix diagonal value above a certain level?

 

 

proc logistic data=Dataset descending;

model Y=var1 var2 var3 var4/ plcl plrl waldcl waldrl

                                                lackfit rsq

                                                influence iplots

                                                itprint;

run;


 

View solution in original post

2 REPLIES 2
Reeza
Super User

Put them into a data set instead and then filter as you would any other source.

 

Check the OUTPUT statement and the options available to capture the data.

 


@maroulator wrote:

To whom it may concern,

 

I am running the following snippet of code and I am trying to capture outliers using the Pearson Residual, Hat Matrix Diagonal, and DfBeta values. This a very straightforward task if my dataset is less than 10,000 observations. My problem, however, is that once I start working on a dataset with observations in the millions, sifting through my output file becomes an extremely cumbersome process; this is because my output file lists out every single one of the observations in the dataset along with the associated leverage/influence metrics and graphs. Is anyone aware of a way to have my output only display the observations that have a DfBeta, or a Person Residual, or a Hat Matrix diagonal value above a certain level?

 

 

proc logistic data=Dataset descending;

model Y=var1 var2 var3 var4/ plcl plrl waldcl waldrl

                                                lackfit rsq

                                                influence iplots

                                                itprint;

run;


 

ballardw
Super User

Adding ods output to the body of the Proc logistic code:

 

proc logistic data=Dataset descending;
model Y=var1 var2 var3 var4/ plcl plrl waldcl waldrl
                           lackfit rsq
                           influence iplots
                           itprint;
ods output influence=myinfluencedatasetname;
run;

 

 

will place the content of the influence results table into a dataset.

Filter or sort as interested.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1272 views
  • 0 likes
  • 3 in conversation