DATA Step, Macro, Functions and more

Winsorizing variables with missing observations

Reply
Contributor
Posts: 36

Winsorizing variables with missing observations

Hi

 

I have a data set containing three variables and I want to winsorize them at 1% and 99%. Moreover, I need to replace the observations which are less than 1% with the observation at 1%, likewise replace the observations which are greater than 99% with the observation at 99%.

These three variables have missing as well as repeated observations. I am using the code below but due to missing and repeated observations, it does not give required outcome.

Please guide me in this regard.

 

%let L=1;    
%let H=%eval(100 - &L);   %* 99th percentile*;
proc univariate data=have noprint;
   var Size BM ME;
   output out=_winsor   pctlpts=&L  &H    
   pctlpre=__Size  __BM  __ME;
run;
data want (drop=__:);
  set have;
  if _n_=1 then set _winsor;
  array wlo  {*} __Size&L  __BM&L   __ME&L;
  array whi  {*} __Size&H __BM&H __ME&H;
  array wval {*} wSize wBM wME;
  array val   {*} Size BM ME;
  do _V=1 to dim(val);
     wval{_V}=min(max(val{_V},wlo{_V}),whi{_V});
  end;
run;
Super User
Posts: 10,691

Re: Winsorizing variables with missing observations

If you have SAS/IML .

 

data have;
 do i=1 to 100;
  a=ceil(ranuni(1)*100);
  b=ceil(ranuni(2)*100);
  if i in (10:14) then call missing(a);
  output;
 end;
 drop i;
run;


%let low=0.05 ;
%let high=0.95 ;

proc iml;
use have;
read all var _num_ into x[c=vname];
close have;
call qntl(q,x,{&low ,&high});

do i=1 to ncol(x);
 x[loc(x[,i]<q[1,i]),i]=q[1,i];
 x[loc(x[,i]>q[2,i]),i]=q[2,i];
end;

create want from x[c=vname];
append from x;
close want;

quit;
Contributor
Posts: 36

Re: Winsorizing variables with missing observations

@Ksharp thanks fro the help. but this code replaces the missing observations with the values at P5 or P95. I want to keep missing observations as missing.

Super User
Posts: 10,691

Re: Winsorizing variables with missing observations

OK. No problem.

 

data have;
 do i=1 to 100;
  a=ceil(ranuni(1)*100);
  b=ceil(ranuni(2)*100);
  if i in (10:14) then do;call missing(a); b=100;end;
  output;
 end;
 drop i;
run;


%let low=0.05 ;
%let high=0.95 ;

proc iml;
use have;
read all var _num_ into x[c=vname];
close have;
call qntl(q,x,{&low ,&high});

do i=1 to ncol(x);
 x[loc(x[,i]<q[1,i] & x[,i]^=.),i]=q[1,i];
 x[loc(x[,i]>q[2,i]),i]=q[2,i];
end;

create want from x[c=vname];
append from x;
close want;

quit;
Ask a Question
Discussion stats
  • 3 replies
  • 137 views
  • 0 likes
  • 2 in conversation