BookmarkSubscribeRSS Feed
ben_sas
Calcite | Level 5

I’m trying to Winsorize my data in Sas Viya V.0.3.05 | Model Studio 8.5.   If I had 100 values with numbers 1, 2, ...10, and I winsorize at 90% level then I'd expect resulting values to be something like: 5,5,5,5,5,6,7,8,9,10,11,....,93,94,95,95,95,95,95.

 

I’ve looked thru the 316-page reference guide and nodes, here, without luck: https://documentation.sas.com/api/collections/vdmmlcdc/8.5/docsets/vdmmlref/content/vdmmlref.pdf?loc...

 

(The 316-page guide does reference two other uses of winsorization: in mean calculation and segment profiling, ...but neither of these is what I’m seeking to do.)

 

For what it's worth, here's my actual data.  Outliers clearly exist:

ben_sas_0-1669929519571.png

 

3 REPLIES 3
chmedi
SAS Employee

You can use the Replacement node.  In the properties for that node, under Interval Inputs, set the Default limits method to "Extreme percentiles", and you can set the percentile value in the Extreme percentile box.  When running this node, it'll create replacement variables for all interval inputs that are winsorized at that percentile.  The original interval inputs will be rejected (dropped) in follow-on nodes.

shamel
Calcite | Level 5

@chmedi Your suggestion will work, but it will winsorize all interval inputs. What if we wanted to winsorize a single interval input without affecting the others? 

I have the same issue and I was able to use two SAS Code nodes as a workaround: the first one to calculate the extreme percentile limits, and the second one to use these values (entered manually) to winsorize the input of interest (see code below). While this works, it consumes more resources (data transfer from CAS to the client in SPRE) and uses manual input of the values for winsorization. 

SAS Technical Support recommended that I use the percentile action to compute the extreme percentile limits, extract these values into a macro variable, and then use this as a limit in a SAS code node. Do you know what code to use to make this work (i.e. to winsorize a single interval input)? 

 

SAS Code Node 1:

/* Calculate Specific Percentiles */
proc stdize data=&dm_data PctlMtd=ORD_STAT outstat=percentiles
pctlpts=5,95;
var x1;
run; /* output: P5=14000, P95=54100 */

 

SAS Code Node 2:

/* SAS code */
%dmcas_metachange(NAME=REP_x1, ROLE=INPUT, LEVEL=INTERVAL);
%dmcas_metachange(NAME=x1, ROLE=REJECT);

 

/* SAS ds1 score code */
Length REP_x1 8;
Label REP_x1='Replacement: Input Variable x1';
Format REP_x1 DOLLAR8.0;
REP_x1=x1;
If not(missing(x1)) then do;
If  x1<14000 then REP_x1=14000;
Else if x1>54100 then REP_x1=54100;
End;

chmedi
SAS Employee
Hi Shamel. You can add a Manage Variables node after the Replacement node and reject (New role=Rejected) the replacement "winsorized" variables that you don't want, and set the original input variables back to input (New role=Input). To make this easier, you can sort the Manage Variables screen by name or role, and then shift select multiple variables to make the change all at once for multiple variables.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 784 views
  • 0 likes
  • 3 in conversation