Programming the statistical procedures from SAS

Winsorization, White standard errors, and area under ROC

Reply
Contributor
Posts: 40

Winsorization, White standard errors, and area under ROC

Hello,

With the attached dataset (which is in Excel format), I have the following 2 codes (1 is proc logistic and 1 is proc surveyreg).  Both codes run properly.  With both codes, I want to know how to calculate:

1. Winsorized data (between 1% and 99%)

2. White standard errors 

 

With the proc logistic function only, I also want to know how to get area under ROC.  What codes should I use to get the things I need?

 

proc logistic data= audit.Combined4_v2c;
class fyear SIC1;
class Weakness / param=ref ref=first;
class Loss / param=ref ref=first;
class GoingConcern / param=ref ref=first;
class Foreign / param=ref ref=first;
class ExtInc / param=ref ref=first;
class Busy / param=ref ref=first;
class AAClient / param=ref ref=first;
class ShortTenure / param=ref ref=first;
class NatEx / param=ref ref=first;
class CityEx / param=ref ref=first;
class Big4 / param=ref ref=first;
model Restatement(event="1") = CulturalImpactCt AuditorOfficeSize Big4 Big4xCulturalImpactCt ClientImportance CityEx NatEx ShortTenure LnAuditFees LnNAF
AAClient LnAssets BM Busy ExtInc Foreign GoingConcern Leverage Loss
REC ROA Segments Weakness fyear SIC1 /link=probit RSQ;
output out= audit.Combined_v2_c1prob1 p=Probability;
run;

 

proc surveyreg data= audit.Combined4_v2c;
class fyear SIC1;
Model ARL = CulturalImpactCt AuditorOfficeSize Big4 Big4xCulturalImpactCt ClientImportance
CityEx NatEx ShortTenure LnAuditFees LnNAF
AAClient LAF AF NAF LnAssets BM Busy ExtInc Foreign GoingConcern Leverage Loss
REC ROA Segments Restatement Weakness fyear SIC1
/solution adjRsq;
run;

 

Thanks so much for your help - I appreciate it!

Jadallah

Super User
Posts: 10,888

Re: Winsorization, White standard errors, and area under ROC

[ Edited ]

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... has instructions on creating SAS data step code from your data set that will allow you create text to replicate your data, or a test set, that we can use to test code against. Either paste as text into a code box using the "run" icon or attach the data step in a TEXT file.

Contributor
Posts: 40

Re: Winsorization, White standard errors, and area under ROC

Hello,

It didn't let me upload the SAS file.  It says "The contents of the attachment doesn't match its file type."  Do you know how I can fix that?

 

Thanks!

Jadallah

Contributor
Posts: 40

Re: Winsorization, White standard errors, and area under ROC

Let's try this.

Attachment
Super User
Posts: 9,782

Re: Winsorization, White standard errors, and area under ROC

Here is Winsorize code.

 

data have;
 do i=1 to 100;
  a=ceil(ranuni(1)*100);
  b=ceil(ranuni(2)*100);
  output;
 end;
 drop i;
run;


%let low=0.05 ;
%let high=0.95 ;

proc iml;
use have;
read all var _num_ into x[c=vname];
close have;
call qntl(q,x,{&low ,&high});

do i=1 to ncol(x);
 x[loc(x[,i]<q[1,i]),i]=q[1,i];
 x[loc(x[,i]>q[2,i]),i]=q[2,i];
end;

create want from x[c=vname];
append from x;
close want;

quit;
Contributor
Posts: 40

Re: Winsorization, White standard errors, and area under ROC

Hello,

Thanks for the response.  I tried the following code:

data audit.combined4_v2c;
do i=1 to 100;
a=ceil(ranuni(1)*100);
b=ceil(ranuni(2)*100);
output;
end;
drop i;
run;


%let low=0.01 ;
%let high=0.99 ;

proc iml;
use audit.combined4_v2c;
read all var _num_ into x[c=vname];
close audit.combined4_v2c;
call qntl(q,x,{&low ,&high});

do i=1 to ncol(x);
x[loc(x[,i]<q[1,i]),i]=q[1,i];
x[loc(x[,i]>q[2,i]),i]=q[2,i];
end;

create audit.combined4_v2car from x[c=vname];
append from x;
close want;

quit;

 

I received an error saying that the matrix has not been set to a value.

 

What do you think I should do?

God bless, best regards, and thanks,

Jadallah

SAS Super FREQ
Posts: 3,559

Re: Winsorization, White standard errors, and area under ROC

The problem is probably the way that the LOC statement is being used to index the data.  If the LOC function returns an empty matrix, you will get an error. For an explanation and what to do about it, see "Beware the Naked LOC".

Contributor
Posts: 40

Re: Winsorization, White standard errors, and area under ROC

Thank you Rick.  But I don't have an if then statement.  So what should I do?

SAS Super FREQ
Posts: 3,559

Re: Winsorization, White standard errors, and area under ROC

If you use less-than-or-equal and greater-than-or-equal then this problem won't occur, so that's the simplest solution:

 

x[loc(x[,i]<=q[1,i]),i]=q[1,i];
x[loc(x[,i]>=q[2,i]),i]=q[2,i];

 

If you have missing values in your data, you need to be a little more careful when Winsorizing. The IML code in the article "How to Winsorize data in SAS" handles missing values.

Contributor
Posts: 40

Re: Winsorization, White standard errors, and area under ROC

Thanks Rick.  The code ran, but how can I translate this to my original question about running a winsorized dataset (at 1% and 99%).  I want to run the proc logistic and proc surveyreg that I posted at the beginning with winsorized data (eliminating the extreme outliers).

 

Thanks again,

Jadallah

SAS Super FREQ
Posts: 3,559

Re: Winsorization, White standard errors, and area under ROC

I will let others comment on your questions. I am skeptical that Winsorizing univariate regressors is a good way to handle outliers in regression.

 

There have been similar questions asked before. It might be useful to read this doc

http://support.sas.com/kb/30/333.html#white

and these threads:

https://communities.sas.com/t5/SAS-Procedures/White-standard-errors/td-p/129061

https://communities.sas.com/t5/SAS-Statistical-Procedures/NEED-HELP-LOGIT-model-w-Robust-S-E/td-p/37...

 

Ask a Question
Discussion stats
  • 10 replies
  • 418 views
  • 4 likes
  • 4 in conversation