BookmarkSubscribeRSS Feed
jjadall1
Quartz | Level 8

Hello,

With the attached dataset (which is in Excel format), I have the following 2 codes (1 is proc logistic and 1 is proc surveyreg).  Both codes run properly.  With both codes, I want to know how to calculate:

1. Winsorized data (between 1% and 99%)

2. White standard errors 

 

With the proc logistic function only, I also want to know how to get area under ROC.  What codes should I use to get the things I need?

 

proc logistic data= audit.Combined4_v2c;
class fyear SIC1;
class Weakness / param=ref ref=first;
class Loss / param=ref ref=first;
class GoingConcern / param=ref ref=first;
class Foreign / param=ref ref=first;
class ExtInc / param=ref ref=first;
class Busy / param=ref ref=first;
class AAClient / param=ref ref=first;
class ShortTenure / param=ref ref=first;
class NatEx / param=ref ref=first;
class CityEx / param=ref ref=first;
class Big4 / param=ref ref=first;
model Restatement(event="1") = CulturalImpactCt AuditorOfficeSize Big4 Big4xCulturalImpactCt ClientImportance CityEx NatEx ShortTenure LnAuditFees LnNAF
AAClient LnAssets BM Busy ExtInc Foreign GoingConcern Leverage Loss
REC ROA Segments Weakness fyear SIC1 /link=probit RSQ;
output out= audit.Combined_v2_c1prob1 p=Probability;
run;

 

proc surveyreg data= audit.Combined4_v2c;
class fyear SIC1;
Model ARL = CulturalImpactCt AuditorOfficeSize Big4 Big4xCulturalImpactCt ClientImportance
CityEx NatEx ShortTenure LnAuditFees LnNAF
AAClient LAF AF NAF LnAssets BM Busy ExtInc Foreign GoingConcern Leverage Loss
REC ROA Segments Restatement Weakness fyear SIC1
/solution adjRsq;
run;

 

Thanks so much for your help - I appreciate it!

Jadallah

10 REPLIES 10
ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... has instructions on creating SAS data step code from your data set that will allow you create text to replicate your data, or a test set, that we can use to test code against. Either paste as text into a code box using the "run" icon or attach the data step in a TEXT file.

jjadall1
Quartz | Level 8

Hello,

It didn't let me upload the SAS file.  It says "The contents of the attachment doesn't match its file type."  Do you know how I can fix that?

 

Thanks!

Jadallah

Ksharp
Super User

Here is Winsorize code.

 

data have;
 do i=1 to 100;
  a=ceil(ranuni(1)*100);
  b=ceil(ranuni(2)*100);
  output;
 end;
 drop i;
run;


%let low=0.05 ;
%let high=0.95 ;

proc iml;
use have;
read all var _num_ into x[c=vname];
close have;
call qntl(q,x,{&low ,&high});

do i=1 to ncol(x);
 x[loc(x[,i]<q[1,i]),i]=q[1,i];
 x[loc(x[,i]>q[2,i]),i]=q[2,i];
end;

create want from x[c=vname];
append from x;
close want;

quit;
jjadall1
Quartz | Level 8

Hello,

Thanks for the response.  I tried the following code:

data audit.combined4_v2c;
do i=1 to 100;
a=ceil(ranuni(1)*100);
b=ceil(ranuni(2)*100);
output;
end;
drop i;
run;


%let low=0.01 ;
%let high=0.99 ;

proc iml;
use audit.combined4_v2c;
read all var _num_ into x[c=vname];
close audit.combined4_v2c;
call qntl(q,x,{&low ,&high});

do i=1 to ncol(x);
x[loc(x[,i]<q[1,i]),i]=q[1,i];
x[loc(x[,i]>q[2,i]),i]=q[2,i];
end;

create audit.combined4_v2car from x[c=vname];
append from x;
close want;

quit;

 

I received an error saying that the matrix has not been set to a value.

 

What do you think I should do?

God bless, best regards, and thanks,

Jadallah

Rick_SAS
SAS Super FREQ

The problem is probably the way that the LOC statement is being used to index the data.  If the LOC function returns an empty matrix, you will get an error. For an explanation and what to do about it, see "Beware the Naked LOC".

jjadall1
Quartz | Level 8

Thank you Rick.  But I don't have an if then statement.  So what should I do?

Rick_SAS
SAS Super FREQ

If you use less-than-or-equal and greater-than-or-equal then this problem won't occur, so that's the simplest solution:

 

x[loc(x[,i]<=q[1,i]),i]=q[1,i];
x[loc(x[,i]>=q[2,i]),i]=q[2,i];

 

If you have missing values in your data, you need to be a little more careful when Winsorizing. The IML code in the article "How to Winsorize data in SAS" handles missing values.

jjadall1
Quartz | Level 8

Thanks Rick.  The code ran, but how can I translate this to my original question about running a winsorized dataset (at 1% and 99%).  I want to run the proc logistic and proc surveyreg that I posted at the beginning with winsorized data (eliminating the extreme outliers).

 

Thanks again,

Jadallah

Rick_SAS
SAS Super FREQ

I will let others comment on your questions. I am skeptical that Winsorizing univariate regressors is a good way to handle outliers in regression.

 

There have been similar questions asked before. It might be useful to read this doc

http://support.sas.com/kb/30/333.html#white

and these threads:

https://communities.sas.com/t5/SAS-Procedures/White-standard-errors/td-p/129061

https://communities.sas.com/t5/SAS-Statistical-Procedures/NEED-HELP-LOGIT-model-w-Robust-S-E/td-p/37...

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1765 views
  • 4 likes
  • 4 in conversation