BookmarkSubscribeRSS Feed
mahler_ji
Obsidian | Level 7

Hello All,

I am a relatively experienced user of SAS, but I received the following code from another SAS Support member for winsorizing data (which works amazingly, by the way), and I would love it if someone could explain part of his code to me, so that I know what is happening and maybe I could use techniques like this in my later code.  The code is as follows:

%let L=10;    %* 10th percentile *;

%let H=%eval(100 - &L);   %* 90th percentile*;

proc univariate data=have noprint;

   by qtr;

   var size;

   output out=_winsor   pctlpts=&L  &H     pctlpre=__size ;

run;

data want (drop=_:);

  set have;

  if _n_=1 then set _winsor;

  array wlo  {*} __size&L  ;

  array whi  {*} __size&H ;

  array wval {*} wsize ;

  array val   {*} size ;

  do _V=1 to dim(val);

     wval{_V}=min(max(val{_V},wlo{_V}),whi{_V});

  end;

run;

I know what the first two parts are doing, setting my limits for the winsorization and using Univariate to get my percentile values for my by variable.  But the third part I don't really understand.

What are the two set statements doing and what is _n_ = 1 mean?  I don't have that variable anywhere?  I have never seen _: which shows up in the drop statement.  And what is _V?  I don't see that variable anywhere else?

Any help would be awesome, and thanks for taking the time to educate a newbie.

John

11 REPLIES 11
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

_N_ is an automatic variable which identifies the row number: SAS(R) 9.2 Language Reference: Concepts, Second Edition

_V in your code is a variable setup to iterate over a loop.  The loop in this case is one to the size of the val array.  So the code between the do and end runs 1 to size of val times, each time with an increment (in this case 1) each time.  You could not use a loop and have:

wval{1}=min(max(val{1},wlo{1}),whi{1});

wval{2}=min(max(val{2},wlo{2}),whi{2});

etc.

LinusH
Tourmaline | Level 20

_N_ is an automatic variable that keeps track which is the current observation number.

It is not stored in the output data set.

Data never sleeps
data_null__
Jade | Level 19

Linus Hjorth wrote:

_N_ is an automatic variable that keeps track which is the current observation number.

It is not stored in the output data set.

Not necessarily, the value if _N_ is incremented at the start of each iteration of the data step loop, sometimes referred to as the observation loop, it doesn't necessary correspond to the observation number of observations being read or written. .

art297
Opal | Level 21

My guess is that it may not be doing what it was intended to do.

The line: if _n_=1 then set _winsor;

adds the first quarter's 10 and 90 percent values to the new file (want), and makes those values available for all of the records in have.

However, it ignores the 10 and 90 percent values for all of the other quarters, as those records are never read from the file _winsor.

Astounding
PROC Star

The two SET statements, in combination with _N_, is a very standard way to combine two SAS data sets.  The usual conditions would be that _WINSOR contains just one observation which you would like added to every observation in HAVE.  There's no variable to merge by, so this is the workaround.  Read _WINSOR just once, and its variables are automatically retained.  (Any variables that are read from a SAS data set are automatically retained.)

Regarding some of the other pieces, _: means all variable names that begin with an underscore.  So all of those get dropped from the output data set.

_V is just a name for a variable.  It doesn't exist ahead of time, and it gets dropped because of the DROP clause.  When you work with arrays, you usually need a variable to serve as an index to the array ... which variable within the array is currently being processed.  _V serves as that index to the array here.

Good luck.

art297
Opal | Level 21

Astounding: there is a potential merge by variable, namely qtr, which was used as a by variable when proc univariate was run.

Astounding
PROC Star

Art,

True, true.  So why is the program written as it is?

One possibility:  By design, only the first observation in _WINSOR is supposed to be used.

Another possibility:  The program was originally written without a BY statement in PROC UNIVARIATE.  Once the BY statement was added, the program should have switched to MERGE later, but nobody took the trouble to figure that out.

I think we are too far away from the circumstances to tell which (if either) is correct.

CharlotteCain
Quartz | Level 8

Hi @

Can you please explain the array statements in the original poster's code for me.

array wlo  {*} __size&L  ;/*if the macro &L resolves to some value, how many elements would this array contain?*/

  array whi  {*} __size&H ;/*if the macro &H resolves to some value, how many elements would this array contain?*/

  array wval {*} wsize ;/*looks like just one element here?why an array?*/

  array val   {*} size ;/*looks like just one element here?why an array?*/

Sorry for the bother,

Charlotte

Astounding
PROC Star

Charlotte,

In every case there would be one element in each array.

Why an array?  Anybody's guess.  Perhaps this is a simple example, and the program sometimes uses a more complex set of values.  Sometimes I true to guess at the intent based on my experiences, but it's not always possible.

Ksharp
Super User

Did you check proc means which can also get winsorizing data ?

Ksharp
Super User

Did you check proc means which can also get winsorizing data ?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2842 views
  • 1 like
  • 8 in conversation