BookmarkSubscribeRSS Feed
Prateek1
Obsidian | Level 7

hi ,

how to do outliers capping at 95th percentile in this case?

VariableNN MissMeanStd DevMinimum1st Pctl5th Ptcl10th Pctl25th Pctl50th Pctl75th Pctl90th Pctl95th Pctl99th PctlMaximum
BALANCE895001564.472081.5300.0648118.8095423.544808128.25396873.3852312054.374338.585911.519342.1619043.14
BALANCE_FREQUENCY895000.87727070.23690400.0909090.2727270.4545450.888889111111
PURCHASES895001003.22136.63000039.58361.281110.172542.683999.929007.6749039.57
ONEOFF_PURCHASES89500592.43737091659.890000038577.831600.526756701.0840761.25
INSTALLMENTS_PURCHASES89500411.0676447904.33811520000089468.651140.351753.08388722500
CASH_ADVANCE89500978.87111252097.160000001113.873069.794653.699596.3247137.21
PURCHASES_FREQUENCY895000.49035050.401370700000.0833330.50.9166671111
ONEOFF_PURCHASES_FREQUENCY895000.20245770.2983361000000.0833330.30.75111
PURCHASES_INSTALLMENTS_FREQUENCY895000.36443730.3974478000000.1666670.751111
CASH_ADVANCE_FREQUENCY895000.13514420.20012140000000.2222220.4166670.5833330.8333331.5
CASH_ADVANCE_TRX895003.24882686.82464670000004101529123
PURCHASES_TRX8950014.709832424.8576491000017173757117358
CREDIT_LIMIT894914494.453638.8250500100012001600300065009500120001700030000
PAYMENTS895001733.142895.060089.921689179.508581383.273938856.9015461901.283925.296083.4313652.7550721.48
MINIMUM_PAYMENTS8637313864.20654232372.450.01916319.49202173.203221106.028116169.123707312.343947825.4854591781.12767.059170.4376406.21
PRC_FULL_PAYMENT895000.15371460.29249920000000.1428570.6833335111
TENURE8950011.51731841.33833086681012121212121212
Ratio_Balance_Limit894910.38892640.389722300.0000206370.00294040.00779110.04152670.30286960.71758190.91991510.96676061.057153615.9099511
payment_by_min86373139.3500701120.28691460.0007304230.05531290.4544910.6093550.95754932.17049536.260430213.335368121.443113950.60092376840.53
Average_Purchase8950073.8706831160.5249391000011.9812541.393928678.6744828147.639228.5714286763.715981.67
Average_Cash_advance89500208.9051168535.6555169000000247.04197561.180268926.75853551909.6814836.45

 

4 REPLIES 4
RahulG
Barite | Level 11

I do not know whether there is problem on my system or have you pasted everything inappropriately.

Data is flowing outside the intended place holder.

Prateek1
Obsidian | Level 7

hi,

how to do outlier capping  at 95th percentile?

this is a code 

data cluster;
set Y.Cd;

if BALANCE > 5911.51 then BALANCE = 5911.51 ;
if PURCHASES > 3999.92 then PURCHASES = 3999.92 ;
if ONEOFF_PURCHASES > 2675 then ONEOFF_PURCHASES = 2675 ;
if INSTALLMENTS_PURCHASES > 1753.08 then INSTALLMENTS_PURCHASES = 1753.08 ;
if CASH_ADVANCE > 4653.69 then CASH_ADVANCE = 4653.69 ;
if CASH_ADVANCE_TRX > 15 then CASH_ADVANCE_TRX = 15 ;

if PURCHASES_TRX > 57 then PURCHASES_TRX = 57 ;

if CREDIT_LIMIT > 12000 then CREDIT_LIMIT = 12000 ;

if PAYMENTS > 6083.43 then PAYMENTS = 6083.43 ;
if CASH_ADVANCE_FREQUENCY > 0.58333 then CASH_ADVANCE_FREQUENCY = 0.58333 ;
if MINIMUM_PAYMENTS > 2767.05 then MINIMUM_PAYMENTS =2767.05;
if Payment_by_min > 21.44311 then Payment_by_min =21.44311;
if Average_Purchase > 228.571 then Average_Purchase = 228.571;
if Average_Cash_Advance > 1365.19 then Average_Cash_Advance= 926.7585;
run;

VariableNN MissMeanStd DevMinimum1st Pctl5th Ptcl10th Pctl25th Pctl50th Pctl75th Pctl90th Pctl95th Pctl99th PctlMaximum
BALANCE895001564.472081.5300.064818.8095423.544808128.25396873.3852312054.374338.585911.519342.1619043.14
BALANCE_FREQUENCY895000.87727070.23690400.090910.2727270.4545450.888889111111
PURCHASES895001003.22136.63000039.58361.281110.172542.683999.929007.6749039.57
ONEOFF_PURCHASES89500592.43737091659.890000038577.831600.526756701.0840761.25
INSTALLMENTS_PURCHASES89500411.0676447904.33810000089468.651140.351753.08388722500
CASH_ADVANCE89500978.87111252097.160000001113.873069.794653.699596.3247137.21
PURCHASES_FREQUENCY895000.49035050.40137100000.0833330.50.9166671111
ONEOFF_PURCHASES_FREQUENCY895000.20245770.298336000000.0833330.30.75111
PURCHASES_INSTALLMENTS_FREQUENCY895000.36443730.397448000000.1666670.751111
CASH_ADVANCE_FREQUENCY895000.13514420.2001210000000.2222220.4166670.5833330.8333331.5
CASH_ADVANCE_TRX895003.24882686.8246470000004101529123
PURCHASES_TRX8950014.709832424.85765000017173757117358
CREDIT_LIMIT894914494.453638.8250500100012001600300065009500120001700030000
PAYMENTS895001733.142895.060089.921689179.508581383.273938856.9015461901.283925.296083.4313652.7550721.48
MINIMUM_PAYMENTS8637313864.20654232372.450.0219.49273.203221106.028116169.123707312.343947825.4854591781.12767.059170.4376406.21
PRC_FULL_PAYMENT895000.15371460.2924990000000.1428570.6833335111
TENURE8950011.51731841.3383316681012121212121212
Ratio_Balance_Limit894910.38892640.38972202.1E-050.00294040.00779110.04152670.30286960.71758190.91991510.96676061.057153615.9099511
payment_by_min86373139.3500701120.286900.055310.4544910.6093550.95754932.17049536.260430213.335368121.443113950.60092376840.53
Average_Purchase8950073.8706831160.5249000011.9812541.393928678.6744828147.639228.5714286763.715981.67
Average_Cash_advance89500208.9051168535.6555000000247.04197561.180268926.75853551909.6814836.45

 

Ksharp
Super User

So you only want replace any value above 95th percentile  with 95th percentile.

 


%let low=0.05 ;
%let high=0.95 ;

proc iml;
use have;
read all var _num_ into x[c=vname];
close have;
call qntl(q,x,{&low ,&high});

do i=1 to ncol(x);
 *x[loc(x[,i]<q[1,i]),i]=q[1,i];
 x[loc(x[,i]>q[2,i]),i]=q[2,i];
end;

create want from x[c=vname];
append from x;
close want;

quit;
Ksharp
Super User

What kind of outpu do you want ? you want use 95th percentile to replace any value greater than 95th percentile?

Or you want winsorize data ? Here is IML code:

 

 

data have;
 do i=1 to 100;
  a=ceil(ranuni(1)*100);
  b=ceil(ranuni(2)*100);
  output;
 end;
 drop i;
run;


%let low=0.05 ;
%let high=0.95 ;

proc iml;
use have;
read all var _num_ into x[c=vname];
close have;
call qntl(q,x,{&low ,&high});

do i=1 to ncol(x);
 x[loc(x[,i]<q[1,i]),i]=q[1,i];
 x[loc(x[,i]>q[2,i]),i]=q[2,i];
end;

create want from x[c=vname];
append from x;
close want;

quit;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 763 views
  • 0 likes
  • 3 in conversation