outliers

Reply
Contributor
Posts: 50

outliers

hi ,

how to do outliers capping at 95th percentile in this case?

VariableNN MissMeanStd DevMinimum1st Pctl5th Ptcl10th Pctl25th Pctl50th Pctl75th Pctl90th Pctl95th Pctl99th PctlMaximum
BALANCE895001564.472081.5300.0648118.8095423.544808128.25396873.3852312054.374338.585911.519342.1619043.14
BALANCE_FREQUENCY895000.87727070.23690400.0909090.2727270.4545450.888889111111
PURCHASES895001003.22136.63000039.58361.281110.172542.683999.929007.6749039.57
ONEOFF_PURCHASES89500592.43737091659.890000038577.831600.526756701.0840761.25
INSTALLMENTS_PURCHASES89500411.0676447904.33811520000089468.651140.351753.08388722500
CASH_ADVANCE89500978.87111252097.160000001113.873069.794653.699596.3247137.21
PURCHASES_FREQUENCY895000.49035050.401370700000.0833330.50.9166671111
ONEOFF_PURCHASES_FREQUENCY895000.20245770.2983361000000.0833330.30.75111
PURCHASES_INSTALLMENTS_FREQUENCY895000.36443730.3974478000000.1666670.751111
CASH_ADVANCE_FREQUENCY895000.13514420.20012140000000.2222220.4166670.5833330.8333331.5
CASH_ADVANCE_TRX895003.24882686.82464670000004101529123
PURCHASES_TRX8950014.709832424.8576491000017173757117358
CREDIT_LIMIT894914494.453638.8250500100012001600300065009500120001700030000
PAYMENTS895001733.142895.060089.921689179.508581383.273938856.9015461901.283925.296083.4313652.7550721.48
MINIMUM_PAYMENTS8637313864.20654232372.450.01916319.49202173.203221106.028116169.123707312.343947825.4854591781.12767.059170.4376406.21
PRC_FULL_PAYMENT895000.15371460.29249920000000.1428570.6833335111
TENURE8950011.51731841.33833086681012121212121212
Ratio_Balance_Limit894910.38892640.389722300.0000206370.00294040.00779110.04152670.30286960.71758190.91991510.96676061.057153615.9099511
payment_by_min86373139.3500701120.28691460.0007304230.05531290.4544910.6093550.95754932.17049536.260430213.335368121.443113950.60092376840.53
Average_Purchase8950073.8706831160.5249391000011.9812541.393928678.6744828147.639228.5714286763.715981.67
Average_Cash_advance89500208.9051168535.6555169000000247.04197561.180268926.75853551909.6814836.45

 

Super Contributor
Posts: 259

Re: outliers

I do not know whether there is problem on my system or have you pasted everything inappropriately.

Data is flowing outside the intended place holder.

Super User
Posts: 9,865

Re: outliers

What kind of outpu do you want ? you want use 95th percentile to replace any value greater than 95th percentile?

Or you want winsorize data ? Here is IML code:

 

 

data have;
 do i=1 to 100;
  a=ceil(ranuni(1)*100);
  b=ceil(ranuni(2)*100);
  output;
 end;
 drop i;
run;


%let low=0.05 ;
%let high=0.95 ;

proc iml;
use have;
read all var _num_ into x[c=vname];
close have;
call qntl(q,x,{&low ,&high});

do i=1 to ncol(x);
 x[loc(x[,i]<q[1,i]),i]=q[1,i];
 x[loc(x[,i]>q[2,i]),i]=q[2,i];
end;

create want from x[c=vname];
append from x;
close want;

quit;

 

Contributor
Posts: 50

outlier treatment

hi,

how to do outlier capping  at 95th percentile?

this is a code 

data cluster;
set Y.Cd;

if BALANCE > 5911.51 then BALANCE = 5911.51 ;
if PURCHASES > 3999.92 then PURCHASES = 3999.92 ;
if ONEOFF_PURCHASES > 2675 then ONEOFF_PURCHASES = 2675 ;
if INSTALLMENTS_PURCHASES > 1753.08 then INSTALLMENTS_PURCHASES = 1753.08 ;
if CASH_ADVANCE > 4653.69 then CASH_ADVANCE = 4653.69 ;
if CASH_ADVANCE_TRX > 15 then CASH_ADVANCE_TRX = 15 ;

if PURCHASES_TRX > 57 then PURCHASES_TRX = 57 ;

if CREDIT_LIMIT > 12000 then CREDIT_LIMIT = 12000 ;

if PAYMENTS > 6083.43 then PAYMENTS = 6083.43 ;
if CASH_ADVANCE_FREQUENCY > 0.58333 then CASH_ADVANCE_FREQUENCY = 0.58333 ;
if MINIMUM_PAYMENTS > 2767.05 then MINIMUM_PAYMENTS =2767.05;
if Payment_by_min > 21.44311 then Payment_by_min =21.44311;
if Average_Purchase > 228.571 then Average_Purchase = 228.571;
if Average_Cash_Advance > 1365.19 then Average_Cash_Advance= 926.7585;
run;

VariableNN MissMeanStd DevMinimum1st Pctl5th Ptcl10th Pctl25th Pctl50th Pctl75th Pctl90th Pctl95th Pctl99th PctlMaximum
BALANCE895001564.472081.5300.064818.8095423.544808128.25396873.3852312054.374338.585911.519342.1619043.14
BALANCE_FREQUENCY895000.87727070.23690400.090910.2727270.4545450.888889111111
PURCHASES895001003.22136.63000039.58361.281110.172542.683999.929007.6749039.57
ONEOFF_PURCHASES89500592.43737091659.890000038577.831600.526756701.0840761.25
INSTALLMENTS_PURCHASES89500411.0676447904.33810000089468.651140.351753.08388722500
CASH_ADVANCE89500978.87111252097.160000001113.873069.794653.699596.3247137.21
PURCHASES_FREQUENCY895000.49035050.40137100000.0833330.50.9166671111
ONEOFF_PURCHASES_FREQUENCY895000.20245770.298336000000.0833330.30.75111
PURCHASES_INSTALLMENTS_FREQUENCY895000.36443730.397448000000.1666670.751111
CASH_ADVANCE_FREQUENCY895000.13514420.2001210000000.2222220.4166670.5833330.8333331.5
CASH_ADVANCE_TRX895003.24882686.8246470000004101529123
PURCHASES_TRX8950014.709832424.85765000017173757117358
CREDIT_LIMIT894914494.453638.8250500100012001600300065009500120001700030000
PAYMENTS895001733.142895.060089.921689179.508581383.273938856.9015461901.283925.296083.4313652.7550721.48
MINIMUM_PAYMENTS8637313864.20654232372.450.0219.49273.203221106.028116169.123707312.343947825.4854591781.12767.059170.4376406.21
PRC_FULL_PAYMENT895000.15371460.2924990000000.1428570.6833335111
TENURE8950011.51731841.3383316681012121212121212
Ratio_Balance_Limit894910.38892640.38972202.1E-050.00294040.00779110.04152670.30286960.71758190.91991510.96676061.057153615.9099511
payment_by_min86373139.3500701120.286900.055310.4544910.6093550.95754932.17049536.260430213.335368121.443113950.60092376840.53
Average_Purchase8950073.8706831160.5249000011.9812541.393928678.6744828147.639228.5714286763.715981.67
Average_Cash_advance89500208.9051168535.6555000000247.04197561.180268926.75853551909.6814836.45

 

Super User
Posts: 9,865

Re: outlier treatment

So you only want replace any value above 95th percentile  with 95th percentile.

 


%let low=0.05 ;
%let high=0.95 ;

proc iml;
use have;
read all var _num_ into x[c=vname];
close have;
call qntl(q,x,{&low ,&high});

do i=1 to ncol(x);
 *x[loc(x[,i]<q[1,i]),i]=q[1,i];
 x[loc(x[,i]>q[2,i]),i]=q[2,i];
end;

create want from x[c=vname];
append from x;
close want;

quit;
Ask a Question
Discussion stats
  • 4 replies
  • 325 views
  • 0 likes
  • 3 in conversation