- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have to winsorize all the variables in dataset at 1st and 99th percemtile.
How to do it SAS.
Thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do a Google search. Plenty of examples out there.
If you want a usable code answer, provide usable sample data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@srikanthyadav44 wrote:
I have to winsorize all the variables in dataset at 1st and 99th percemtile.
How to do it SAS.
With such a brief description, I guess more detail is needed.
Do you want to compute means and/or other statistics on these winsorized variables? Or do you want to modify the existing data set via winsorizing?
If you want to compute means, PROC UNIVARIATE will compute winsorized means.
You can compute the 1st and 99th percentile using PROC SUMMARY, and then merge that back into your data so as to then perform the winsorizing.
proc summary data=have;
var variablename;
output out=_stats_ p1=p1 p99=p99;
run;
data want;
if _n_=1 then set _Stats_;
set have;
if variablename<p1 then variablename=p1;
if variablename>p99 then variablename=p99;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you have IML ? @Rick_SAS has written a blog about Winsorize before.
data have;
do i=1 to 100;
a=ceil(ranuni(1)*100);
b=ceil(ranuni(2)*100);
output;
end;
drop i;
run;
%let low=0.05 ;
%let high=0.95 ;
proc iml;
use have;
read all var _num_ into x[c=vname];
close have;
call qntl(q,x,{&low ,&high});
do i=1 to ncol(x);
x[loc(x[,i]<q[1,i]),i]=q[1,i];
x[loc(x[,i]>q[2,i]),i]=q[2,i];
end;
create want from x[c=vname];
append from x;
close want;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Dear Mr. Ksharp
thanks for your reply
i could not understand the code, But i run it and it has generated an output with only 100 observations.
i have the following doubts.
My data set has 3423 observations.
1. can you please tell me how to modify the code to apply it on my dataset.
2. moreover, what exactly, we will get in the output. I could not understand the output.
3. if i want to winsorize the values of more than one variable, can i apply the same code. will it winsorize all variables simultaneously.
please clarify my doubts.
thanks in advance
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1. can you please tell me how to modify the code to apply it on my dataset.
Just replace my HAVE dataset with your real dataset . and change the following for your special percentile.
%let low=0.01 ;
%let high=0.99 ;
2. moreover, what exactly, we will get in the output. I could not understand the output.
Open WANT dataset (contains all the numeric variable). That is what you need ( replace <0.01 percentile with 0.01 percentile, >0.99 with 0.99 percentile )
3. if i want to winsorize the values of more than one variable, can i apply the same code. will it winsorize all variables simultaneously.
please clarify my doubts.
No. My code has already consider ALL the numeric variable , you don't need change anything in IML code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
dear Mr. Ksharp
thanks for your prompt response.
I applied your SAS code again. but, after running the program, i got only 100 observations with in both the real dataset and new dataset.
Actuallly i had 3423 observations. Moreover, the values are not matching no way with the original values which are there in the real dataset.
my real data set is like below, with 3423 observations
Company_Name | Total_remuneration |
20 Microns Ltd. | 3916000 |
20 Microns Ltd. | 3916000 |
20 Microns Ltd. | 4230000 |
20 Microns Ltd. | 4961387 |
20 Microns Ltd. | 4932459 |
3I Infotech Ltd. | 30696000 |
3I Infotech Ltd. | 13360000 |
3I Infotech Ltd. | 23480000 |
3I Infotech Ltd. | 17780000 |
3I Infotech Ltd. | 27000000 |
3I Infotech Ltd. | 19303045 |
63 Moons Technologies Ltd. | 38726562 |
63 Moons Technologies Ltd. | 11906000 |
63 Moons Technologies Ltd. | 3902000 |
63 Moons Technologies Ltd. | 17257000 |
A 2 Z Infra Engg. Ltd. | 3639600 |
A 2 Z Infra Engg. Ltd. | 3033350 |
A 2 Z Infra Engg. Ltd. | 10641600 |
A 2 Z Infra Engg. Ltd. | 1800000 |
A B G Shipyard Ltd. | 2755645 |
A B G Shipyard Ltd. | 13449000 |
A C C Ltd. | 19177000 |
A C C Ltd. | 47690000 |
the output, what i got is as follows, with only 100 observations.
i am also attaching my real dataset and output in the same file
a | b |
19 | 93 |
40 | 26 |
93 | 93 |
55 | 54 |
5 | 7.5 |
82 | 53 |
86 | 7.5 |
93.5 | 30 |
28 | 69 |
93.5 | 23 |
69 | 42 |
56 | 29 |
48 | 85 |
64 | 60 |
59 | 38 |
kindly help me in understanding the output and applying it to my dataset
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ou, You just have only ONE numeric variable .
So you need BY statement to process .
Try Paige's code .Mine is not suited for you .
proc summary data=have;by Company_Name;
var variablename;
output out=_stats_ p1=p1 p99=p99;
run;
data want;
merge have _Stats_; by Company_Name;
if variablename<p1 then variablename=p1;
if variablename>p99 then variablename=p99;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please read "Winsorization: The good, the bad, and the ugly,"
which includes links to SAS code that Winsorizes data, as well as to alternative techniques.