SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
srikanthyadav44
Quartz | Level 8

I have to winsorize all the variables in dataset at 1st and 99th percemtile.

How to do it SAS.

Thanks in advance

8 REPLIES 8
PeterClemmensen
Tourmaline | Level 20

Do a Google search. Plenty of examples out there. 

 

If you want a usable code answer, provide usable sample data.

PaigeMiller
Diamond | Level 26

@srikanthyadav44 wrote:

I have to winsorize all the variables in dataset at 1st and 99th percemtile.

How to do it SAS.


With such a brief description, I guess more detail is needed.

 

Do you want to compute means and/or other statistics on these winsorized variables? Or do you want to modify the existing data set via winsorizing?

 

If you want to compute means, PROC UNIVARIATE will compute winsorized means.

 

You can compute the 1st and 99th percentile using PROC SUMMARY, and then merge that back into your data so as to then perform the winsorizing.

 

proc summary data=have;
    var variablename;
    output out=_stats_ p1=p1 p99=p99;
run;

data want;
    if _n_=1 then set _Stats_;
    set have;
    if variablename<p1 then variablename=p1;
    if variablename>p99 then variablename=p99;
run;
--
Paige Miller
Ksharp
Super User

Do you have IML ? @Rick_SAS  has written a blog about Winsorize before.

 

data have;
 do i=1 to 100;
  a=ceil(ranuni(1)*100);
  b=ceil(ranuni(2)*100);
  output;
 end;
 drop i;
run;


%let low=0.05 ;
%let high=0.95 ;

proc iml;
use have;
read all var _num_ into x[c=vname];
close have;
call qntl(q,x,{&low ,&high});

do i=1 to ncol(x);
 x[loc(x[,i]<q[1,i]),i]=q[1,i];
 x[loc(x[,i]>q[2,i]),i]=q[2,i];
end;

create want from x[c=vname];
append from x;
close want;

quit;

 

 

 

srikanthyadav44
Quartz | Level 8

Dear Mr.  

thanks for your reply

 

i could not understand the code, But i run it and it has generated an output with only 100 observations. 

 

i have the following doubts.

 

My data set has 3423 observations. 

1. can you please tell me how to modify the code to apply it on my dataset. 

2. moreover, what exactly, we will get in the output. I could not understand the output. 

3. if i want to winsorize the values of more than one variable, can i apply the same code. will it winsorize all variables simultaneously. 

please clarify my doubts. 

thanks in advance 

Ksharp
Super User

 

1. can you please tell me how to modify the code to apply it on my dataset. 

Just replace my HAVE dataset with your real dataset . and change the following for your special percentile.

%let low=0.01 ;
%let high=0.99 ;

 

 

 

 

2. moreover, what exactly, we will get in the output. I could not understand the output. 

Open WANT dataset (contains all the numeric variable). That is what you need ( replace <0.01 percentile with 0.01 percentile, >0.99 with 0.99 percentile )

 

3. if i want to winsorize the values of more than one variable, can i apply the same code. will it winsorize all variables simultaneously. 

please clarify my doubts. 

No. My code has already consider ALL the numeric variable , you don't need change anything in IML code.

 

srikanthyadav44
Quartz | Level 8

dear Mr. Ksharp

thanks for your prompt response. 

I applied your SAS code again. but, after running the program, i got only 100 observations with in both the real dataset and new dataset. 

Actuallly i had 3423 observations. Moreover, the values are not matching no way with the original values which are there in the real dataset. 

 

my real data set is like below, with 3423 observations 

Company_NameTotal_remuneration
20 Microns Ltd.3916000
20 Microns Ltd.3916000
20 Microns Ltd.4230000
20 Microns Ltd.4961387
20 Microns Ltd.4932459
3I Infotech Ltd.30696000
3I Infotech Ltd.13360000
3I Infotech Ltd.23480000
3I Infotech Ltd.17780000
3I Infotech Ltd.27000000
3I Infotech Ltd.19303045
63 Moons Technologies Ltd.38726562
63 Moons Technologies Ltd.11906000
63 Moons Technologies Ltd.3902000
63 Moons Technologies Ltd.17257000
A 2 Z Infra Engg. Ltd.3639600
A 2 Z Infra Engg. Ltd.3033350
A 2 Z Infra Engg. Ltd.10641600
A 2 Z Infra Engg. Ltd.1800000
A B G Shipyard Ltd.2755645
A B G Shipyard Ltd.13449000
A C C Ltd.19177000
A C C Ltd.47690000

 

the output, what i got is as follows, with only 100 observations. 

i am also attaching my real dataset and output in the same file 

ab
1993
4026
9393
5554
57.5
8253
867.5
93.530
2869
93.523
6942
5629
4885
6460
5938

 

kindly help me in understanding the output and applying it to my dataset

Ksharp
Super User

Ou, You just have only ONE numeric variable .

So you need BY statement to process .

Try Paige's code .Mine is not suited for you .

 

proc summary data=have;by Company_Name;
    var variablename;
    output out=_stats_ p1=p1 p99=p99;
run;

data want;
 merge have _Stats_; by Company_Name;
    if variablename<p1 then variablename=p1;
    if variablename>p99 then variablename=p99;
run;
Rick_SAS
SAS Super FREQ

Please read "Winsorization: The good, the bad, and the ugly,"

which includes links to SAS code that Winsorizes data, as well as to alternative techniques.

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 8452 views
  • 2 likes
  • 5 in conversation