To whom it may concern,
I am running the following snippet of code and I am trying to capture outliers using the Pearson Residual, Hat Matrix Diagonal, and DfBeta values. This a very straightforward task if my dataset is less than 10,000 observations. My problem, however, is that once I start working on a dataset with observations in the millions, sifting through my output file becomes an extremely cumbersome process; this is because my output file lists out every single one of the observations in the dataset along with the associated leverage/influence metrics and graphs. Is anyone aware of a way to have my output only display the observations that have a DfBeta, or a Person Residual, or a Hat Matrix diagonal value above a certain level?
proc logistic data=Dataset descending;
model Y=var1 var2 var3 var4/ plcl plrl waldcl waldrl
lackfit rsq
influence iplots
itprint;
run;
Put them into a data set instead and then filter as you would any other source.
Check the OUTPUT statement and the options available to capture the data.
@maroulator wrote:
To whom it may concern,
I am running the following snippet of code and I am trying to capture outliers using the Pearson Residual, Hat Matrix Diagonal, and DfBeta values. This a very straightforward task if my dataset is less than 10,000 observations. My problem, however, is that once I start working on a dataset with observations in the millions, sifting through my output file becomes an extremely cumbersome process; this is because my output file lists out every single one of the observations in the dataset along with the associated leverage/influence metrics and graphs. Is anyone aware of a way to have my output only display the observations that have a DfBeta, or a Person Residual, or a Hat Matrix diagonal value above a certain level?
proc logistic data=Dataset descending;
model Y=var1 var2 var3 var4/ plcl plrl waldcl waldrl
lackfit rsq
influence iplots
itprint;
run;
Put them into a data set instead and then filter as you would any other source.
Check the OUTPUT statement and the options available to capture the data.
@maroulator wrote:
To whom it may concern,
I am running the following snippet of code and I am trying to capture outliers using the Pearson Residual, Hat Matrix Diagonal, and DfBeta values. This a very straightforward task if my dataset is less than 10,000 observations. My problem, however, is that once I start working on a dataset with observations in the millions, sifting through my output file becomes an extremely cumbersome process; this is because my output file lists out every single one of the observations in the dataset along with the associated leverage/influence metrics and graphs. Is anyone aware of a way to have my output only display the observations that have a DfBeta, or a Person Residual, or a Hat Matrix diagonal value above a certain level?
proc logistic data=Dataset descending;
model Y=var1 var2 var3 var4/ plcl plrl waldcl waldrl
lackfit rsq
influence iplots
itprint;
run;
Adding ods output to the body of the Proc logistic code:
proc logistic data=Dataset descending; model Y=var1 var2 var3 var4/ plcl plrl waldcl waldrl lackfit rsq influence iplots itprint; ods output influence=myinfluencedatasetname; run;
will place the content of the influence results table into a dataset.
Filter or sort as interested.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.