DATA Step, Macro, Functions and more

Variable Created using SUM statement

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 13
Accepted Solution

Variable Created using SUM statement

When a variable is created using SUM statement, it will be assigned to what value and when? Either it will be assigned a value 0 when data step begin execution or assigned a value 0 at compile time.?

Example:

data work.NEW;

     set work.OLD;

     count+1;

run;

The variable Count is created using a sum statement. Which statement regarding this variable is true?

A. It is assigned a value 0 when the data step begins execution.

B. It is assigned a value of missing when the data step begins execution.

C. It is assigned a value 0 at compile time.

D. It is assigned a value of missing at compile time.

What is the correct answer to this question with explanation?

According to me, it should be answer (A) - It is assigned a value 0 when the data step begins execution but need to double check.


Accepted Solutions
Solution
‎03-23-2014 06:37 PM
PROC Star
Posts: 7,363

Re: Variable Created using SUM statement

Jojan,

You missed my point.  According to Neil's paper, the first put _all_ statement, before an input statement, reflects what has occurred during the compilation phase.

Run the following code and then look at the log:

data class;

  format count1 count2 count3 best12.;

  retain count2;

  retain count3 0;

  infile cards dlm='09'x;

  put _all_;

  input (name sex) ($) age height weight;

  count1+1;

  count2=count2+1;

  count3=count3+1;

  put _all_;

  cards;

Joyce F 11 51.3 50.5

Thomas M 11 57.5 85

James M 12 57.3 83

Jane F 12 59.8 84.5

John M 12 59 99.5

Louise F 12 56.3 77

Robert M 12 64.8 128

Alice F 13 56.5 84

Barbara F 13 65.3 98

Jeffrey M 13 62.5 84

Alfred M 14 69 112.5

Carol F 14 62.8 102.5

Henry M 14 63.5 102.5

Judy F 14 64.3 90

Janet F 15 62.5 112.5

Mary F 15 66.5 112

Ronald M 15 67 133

William M 15 66.5 112

Philip M 16 72 150

;

You'll notice that the first values of count1 and count3 shown in the log are 0.  Thus, if  Neil's paper is correct, that occurred during the compilation phase.

 

View solution in original post


All Replies
Trusted Advisor
Posts: 1,128

Re: Variable Created using SUM statement

I too believe , the correct answer is (A) as it will take a initial value of 0 during the execution time and then adds 1 . The below code should helps us understand what is happening before the sum statement and after the sum statement from the log.

before the sum statement it has a value of 0 and after sum statement it takes the values of 1.

data class;

  set sashelp.class;

  put _all_;

  count+1;

  put _all_;

run;

Thanks,

Jag

Thanks,
Jag
PROC Star
Posts: 7,363

Re: Variable Created using SUM statement

Not being a programmer, per se, some of the distinctions between whether something happens at the compilation vs the execution phase are often irrelevant to me .. as long as I know the order that statements will be processed.

That said, I have read that the retain statement is acted on during the compilation phase.  If that is correct, then I would choose option C.

The following code illustrates, I think, that the implied retain in a sum statement is what causes the value to be initially set at 0:

data class;

  format count1 count2 count3 best12.;

  retain count2;

  retain count3 0;

  set sashelp.class;

  put _all_;

  count1+1;

  count2=count2+1;

  count3=count3+1;

  put _all_;

run;

Contributor
Posts: 43

Re: Variable Created using SUM statement

It should be (A). Variable count in your code will get the default initial value of 0 just before the data step read the first observation.

Where as a RETAIN statement is only a compile time statement and is used to initialize a sum variable with a value other than 0.

In addition to that, if you don't supply any value to the variable in RETAIN  statement, then it just act as if it was simply just another sum variable, means its initialized to 0 like in the first case.

In any case, the initial value, whether it is 0 or other, is assigned at the starting of execution. Thanks!

Note: The default initialization value for  RETAIN variable if not given explicitly, is a missing value. Not 0. I posted it above to be 0 since I read it from SAS base prep guide which is wrong.

PROC Star
Posts: 7,363

Re: Variable Created using SUM statement

Contributor
Posts: 43

Re: Variable Created using SUM statement

Hi Arthur;

The following is just a relevant portion from the pdf which you gave.

There is a distinct compile action and execution for each DATA and PROC step in a SAS program. Each step is

compiled, then executed, independently and sequentially. Understanding the defaults of each activity in DATA step

processing is critical to achieving accurate results. During the compilation of a DATA step, the following actions

(among others) occur:

  • syntax scan
  • SAS source code translation to machine language
  • definition of input and output files
  • creation of tools:

           input buffer (if reading any non-SAS data),

          Logical Program Data Vector (LPDV), and data set descriptor information

    • determining variable attributes for output SAS data set
    • capturing variables to be initialized to missing.

I think compilation is only about internal machine language instructions as to what is or how it is to be done during the execution. In SAS an LPDV is also created just like another buffer where temporary results of processing is stored before taking to the output data set. There must be some instruction that the accumulator  variable need to be assigned to 0, I mean actual assignment takes place during the very beginning of execution. Otherwise the system would be wasting memory resources I guess.

RETAIN statement is compile time only means it is just for the sake of telling SAS explicitly that the variable need not be reinitialized to missing through every  iterations and is not encountered during execution a second time. Just like <#include> pre-processor in C language.

This is my understanding and please don't bother if I am incorrect and help me giving more of your valuable knowledge. Thank you!

Jojan.

Solution
‎03-23-2014 06:37 PM
PROC Star
Posts: 7,363

Re: Variable Created using SUM statement

Jojan,

You missed my point.  According to Neil's paper, the first put _all_ statement, before an input statement, reflects what has occurred during the compilation phase.

Run the following code and then look at the log:

data class;

  format count1 count2 count3 best12.;

  retain count2;

  retain count3 0;

  infile cards dlm='09'x;

  put _all_;

  input (name sex) ($) age height weight;

  count1+1;

  count2=count2+1;

  count3=count3+1;

  put _all_;

  cards;

Joyce F 11 51.3 50.5

Thomas M 11 57.5 85

James M 12 57.3 83

Jane F 12 59.8 84.5

John M 12 59 99.5

Louise F 12 56.3 77

Robert M 12 64.8 128

Alice F 13 56.5 84

Barbara F 13 65.3 98

Jeffrey M 13 62.5 84

Alfred M 14 69 112.5

Carol F 14 62.8 102.5

Henry M 14 63.5 102.5

Judy F 14 64.3 90

Janet F 15 62.5 112.5

Mary F 15 66.5 112

Ronald M 15 67 133

William M 15 66.5 112

Philip M 16 72 150

;

You'll notice that the first values of count1 and count3 shown in the log are 0.  Thus, if  Neil's paper is correct, that occurred during the compilation phase.

 

Contributor
Posts: 43

Re: Variable Created using SUM statement

First values of count1 and count3 are definitely 0, that's true. But the assignment to zero happens only at the beginning of execution.

If you go through the LPDV diagram in Neil's paper, you can see that during compilation, just a structure of LPDV is created with type and other attributes of each variable.

The initial values and even the missing values are assigned only at the beginning of execution. RETAIN statement or an accumulator variable from a sum statement gives information to the LPDV as to what should be done with these variables at the beginning and throughout the data step iterations.

Thank you,

Jojan.

PROC Star
Posts: 7,363

Re: Variable Created using SUM statement

Jojan,

We'll simply have to agree to disagree.  However, on an exam, I think my answer is right and your's would be marked wrong.

Maybe, someone from SAS who is familiar with what is actually going on in the two phases, will volunteer to be a referee.

Art

Super User
Posts: 17,819

Re: Variable Created using SUM statement

I think the answer's D, but again, not great at compile/execution differentiation.

Retain is a compile time statement.

However, If you omit initial-value, the initial value is missing (docs)

SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition

Also, if you have a variable count in your dataset it may be different as well.

data nothing;

retain count;

put _all_;

run;

data nothing;

retain count;

put _all_;

count+1;

put _all_;

run;

Contributor
Posts: 43

Re: Variable Created using SUM statement

Thats fine.  Thank you Arthur.

PROC Star
Posts: 7,363

Re: Variable Created using SUM statement

While I still think we should wait for someone who really KNOWS, I have to respond to Farezza's post and, in the process, provide some more support for why I responded as I initially did.

Yes, a retain statement, without an initial value, will set the value to missing in the pdv at the compilation phase.

However, a sum statement will set the value to 0 in the pdv at the compilation phase.

I thought Neil's paper did a nice job explaining it, but here are two more definitive overviews that methinks agrees with my position:

SAS(R) 9.2 Language Reference: Concepts, Second Edition

and

The SAS Supervisor - sasCommunity

Regardless, I think that you will find both to be excellent reads.

Super User
Posts: 17,819

Re: Variable Created using SUM statement

Is it just sum or any other implicit retain statement?

PROC Star
Posts: 7,363

Re: Variable Created using SUM statement

Depends upon what you are asking.  Set , merge and temporary arrays all implicitly retain variables, but a sum statement causes an implicit retain with an initial value of 0.  There may be others that I'm not aware of that have the same effect.

Occasional Contributor
Posts: 13

Re: Variable Created using SUM statement

Thanks Arthur for a detailed explanation about this question.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 2251 views
  • 3 likes
  • 6 in conversation