Help using Base SAS procedures

Can Someone Please Explain this Code to me?

Reply
Frequent Contributor
Posts: 101

Can Someone Please Explain this Code to me?

Hello All,

I am a relatively experienced user of SAS, but I received the following code from another SAS Support member for winsorizing data (which works amazingly, by the way), and I would love it if someone could explain part of his code to me, so that I know what is happening and maybe I could use techniques like this in my later code.  The code is as follows:

%let L=10;    %* 10th percentile *;

%let H=%eval(100 - &L);   %* 90th percentile*;

proc univariate data=have noprint;

   by qtr;

   var size;

   output out=_winsor   pctlpts=&L  &H     pctlpre=__size ;

run;

data want (drop=_Smiley Happy;

  set have;

  if _n_=1 then set _winsor;

  array wlo  {*} __size&L  ;

  array whi  {*} __size&H ;

  array wval {*} wsize ;

  array val   {*} size ;

  do _V=1 to dim(val);

     wval{_V}=min(max(val{_V},wlo{_V}),whi{_V});

  end;

run;

I know what the first two parts are doing, setting my limits for the winsorization and using Univariate to get my percentile values for my by variable.  But the third part I don't really understand.

What are the two set statements doing and what is _n_ = 1 mean?  I don't have that variable anywhere?  I have never seen _: which shows up in the drop statement.  And what is _V?  I don't see that variable anywhere else?

Any help would be awesome, and thanks for taking the time to educate a newbie.

John

Super User
Super User
Posts: 7,942

Re: Can Someone Please Explain this Code to me?

Posted in reply to mahler_ji

Hi,

_N_ is an automatic variable which identifies the row number: SAS(R) 9.2 Language Reference: Concepts, Second Edition

_V in your code is a variable setup to iterate over a loop.  The loop in this case is one to the size of the val array.  So the code between the do and end runs 1 to size of val times, each time with an increment (in this case 1) each time.  You could not use a loop and have:

wval{1}=min(max(val{1},wlo{1}),whi{1});

wval{2}=min(max(val{2},wlo{2}),whi{2});

etc.

Super User
Posts: 5,424

Re: Can Someone Please Explain this Code to me?

Posted in reply to mahler_ji

_N_ is an automatic variable that keeps track which is the current observation number.

It is not stored in the output data set.

Data never sleeps
Respected Advisor
Posts: 3,799

Re: Can Someone Please Explain this Code to me?

Linus Hjorth wrote:

_N_ is an automatic variable that keeps track which is the current observation number.

It is not stored in the output data set.

Not necessarily, the value if _N_ is incremented at the start of each iteration of the data step loop, sometimes referred to as the observation loop, it doesn't necessary correspond to the observation number of observations being read or written. .

PROC Star
Posts: 7,467

Re: Can Someone Please Explain this Code to me?

Posted in reply to mahler_ji

My guess is that it may not be doing what it was intended to do.

The line: if _n_=1 then set _winsor;

adds the first quarter's 10 and 90 percent values to the new file (want), and makes those values available for all of the records in have.

However, it ignores the 10 and 90 percent values for all of the other quarters, as those records are never read from the file _winsor.

Super User
Posts: 5,497

Re: Can Someone Please Explain this Code to me?

Posted in reply to mahler_ji

The two SET statements, in combination with _N_, is a very standard way to combine two SAS data sets.  The usual conditions would be that _WINSOR contains just one observation which you would like added to every observation in HAVE.  There's no variable to merge by, so this is the workaround.  Read _WINSOR just once, and its variables are automatically retained.  (Any variables that are read from a SAS data set are automatically retained.)

Regarding some of the other pieces, _: means all variable names that begin with an underscore.  So all of those get dropped from the output data set.

_V is just a name for a variable.  It doesn't exist ahead of time, and it gets dropped because of the DROP clause.  When you work with arrays, you usually need a variable to serve as an index to the array ... which variable within the array is currently being processed.  _V serves as that index to the array here.

Good luck.

PROC Star
Posts: 7,467

Re: Can Someone Please Explain this Code to me?

Posted in reply to Astounding

Astounding: there is a potential merge by variable, namely qtr, which was used as a by variable when proc univariate was run.

Super User
Posts: 5,497

Re: Can Someone Please Explain this Code to me?

Art,

True, true.  So why is the program written as it is?

One possibility:  By design, only the first observation in _WINSOR is supposed to be used.

Another possibility:  The program was originally written without a BY statement in PROC UNIVARIATE.  Once the BY statement was added, the program should have switched to MERGE later, but nobody took the trouble to figure that out.

I think we are too far away from the circumstances to tell which (if either) is correct.

Frequent Contributor
Posts: 137

Re: Can Someone Please Explain this Code to me?

Posted in reply to Astounding

Hi @

Can you please explain the array statements in the original poster's code for me.

array wlo  {*} __size&L  ;/*if the macro &L resolves to some value, how many elements would this array contain?*/

  array whi  {*} __size&H ;/*if the macro &H resolves to some value, how many elements would this array contain?*/

  array wval {*} wsize ;/*looks like just one element here?why an array?*/

  array val   {*} size ;/*looks like just one element here?why an array?*/

Sorry for the bother,

Charlotte

Super User
Posts: 5,497

Re: Can Someone Please Explain this Code to me?

Posted in reply to CharlotteCain

Charlotte,

In every case there would be one element in each array.

Why an array?  Anybody's guess.  Perhaps this is a simple example, and the program sometimes uses a more complex set of values.  Sometimes I true to guess at the intent based on my experiences, but it's not always possible.

Super User
Posts: 10,018

Re: Can Someone Please Explain this Code to me?

Posted in reply to mahler_ji

Did you check proc means which can also get winsorizing data ?

Super User
Posts: 10,018

Re: Can Someone Please Explain this Code to me?

Posted in reply to mahler_ji

Did you check proc means which can also get winsorizing data ?

Ask a Question
Discussion stats
  • 11 replies
  • 406 views
  • 1 like
  • 8 in conversation