BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sanyam
Calcite | Level 5

When a variable is created using SUM statement, it will be assigned to what value and when? Either it will be assigned a value 0 when data step begin execution or assigned a value 0 at compile time.?

Example:

data work.NEW;

     set work.OLD;

     count+1;

run;

The variable Count is created using a sum statement. Which statement regarding this variable is true?

A. It is assigned a value 0 when the data step begins execution.

B. It is assigned a value of missing when the data step begins execution.

C. It is assigned a value 0 at compile time.

D. It is assigned a value of missing at compile time.

What is the correct answer to this question with explanation?

According to me, it should be answer (A) - It is assigned a value 0 when the data step begins execution but need to double check.

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Jojan,

You missed my point.  According to Neil's paper, the first put _all_ statement, before an input statement, reflects what has occurred during the compilation phase.

Run the following code and then look at the log:

data class;

  format count1 count2 count3 best12.;

  retain count2;

  retain count3 0;

  infile cards dlm='09'x;

  put _all_;

  input (name sex) ($) age height weight;

  count1+1;

  count2=count2+1;

  count3=count3+1;

  put _all_;

  cards;

Joyce F 11 51.3 50.5

Thomas M 11 57.5 85

James M 12 57.3 83

Jane F 12 59.8 84.5

John M 12 59 99.5

Louise F 12 56.3 77

Robert M 12 64.8 128

Alice F 13 56.5 84

Barbara F 13 65.3 98

Jeffrey M 13 62.5 84

Alfred M 14 69 112.5

Carol F 14 62.8 102.5

Henry M 14 63.5 102.5

Judy F 14 64.3 90

Janet F 15 62.5 112.5

Mary F 15 66.5 112

Ronald M 15 67 133

William M 15 66.5 112

Philip M 16 72 150

;

You'll notice that the first values of count1 and count3 shown in the log are 0.  Thus, if  Neil's paper is correct, that occurred during the compilation phase.

 

View solution in original post

16 REPLIES 16
Jagadishkatam
Amethyst | Level 16

I too believe , the correct answer is (A) as it will take a initial value of 0 during the execution time and then adds 1 . The below code should helps us understand what is happening before the sum statement and after the sum statement from the log.

before the sum statement it has a value of 0 and after sum statement it takes the values of 1.

data class;

  set sashelp.class;

  put _all_;

  count+1;

  put _all_;

run;

Thanks,

Jag

Thanks,
Jag
art297
Opal | Level 21

Not being a programmer, per se, some of the distinctions between whether something happens at the compilation vs the execution phase are often irrelevant to me .. as long as I know the order that statements will be processed.

That said, I have read that the retain statement is acted on during the compilation phase.  If that is correct, then I would choose option C.

The following code illustrates, I think, that the implied retain in a sum statement is what causes the value to be initially set at 0:

data class;

  format count1 count2 count3 best12.;

  retain count2;

  retain count3 0;

  set sashelp.class;

  put _all_;

  count1+1;

  count2=count2+1;

  count3=count3+1;

  put _all_;

run;

JVarghese
Obsidian | Level 7

It should be (A). Variable count in your code will get the default initial value of 0 just before the data step read the first observation.

Where as a RETAIN statement is only a compile time statement and is used to initialize a sum variable with a value other than 0.

In addition to that, if you don't supply any value to the variable in RETAIN  statement, then it just act as if it was simply just another sum variable, means its initialized to 0 like in the first case.

In any case, the initial value, whether it is 0 or other, is assigned at the starting of execution. Thanks!

Note: The default initialization value for  RETAIN variable if not given explicitly, is a missing value. Not 0. I posted it above to be 0 since I read it from SAS base prep guide which is wrong.

JVarghese
Obsidian | Level 7

Hi Arthur;

The following is just a relevant portion from the pdf which you gave.

There is a distinct compile action and execution for each DATA and PROC step in a SAS program. Each step is

compiled, then executed, independently and sequentially. Understanding the defaults of each activity in DATA step

processing is critical to achieving accurate results. During the compilation of a DATA step, the following actions

(among others) occur:

  • syntax scan
  • SAS source code translation to machine language
  • definition of input and output files
  • creation of tools:

           input buffer (if reading any non-SAS data),

          Logical Program Data Vector (LPDV), and data set descriptor information

    • determining variable attributes for output SAS data set
    • capturing variables to be initialized to missing.

I think compilation is only about internal machine language instructions as to what is or how it is to be done during the execution. In SAS an LPDV is also created just like another buffer where temporary results of processing is stored before taking to the output data set. There must be some instruction that the accumulator  variable need to be assigned to 0, I mean actual assignment takes place during the very beginning of execution. Otherwise the system would be wasting memory resources I guess.

RETAIN statement is compile time only means it is just for the sake of telling SAS explicitly that the variable need not be reinitialized to missing through every  iterations and is not encountered during execution a second time. Just like <#include> pre-processor in C language.

This is my understanding and please don't bother if I am incorrect and help me giving more of your valuable knowledge. Thank you!

Jojan.

art297
Opal | Level 21

Jojan,

You missed my point.  According to Neil's paper, the first put _all_ statement, before an input statement, reflects what has occurred during the compilation phase.

Run the following code and then look at the log:

data class;

  format count1 count2 count3 best12.;

  retain count2;

  retain count3 0;

  infile cards dlm='09'x;

  put _all_;

  input (name sex) ($) age height weight;

  count1+1;

  count2=count2+1;

  count3=count3+1;

  put _all_;

  cards;

Joyce F 11 51.3 50.5

Thomas M 11 57.5 85

James M 12 57.3 83

Jane F 12 59.8 84.5

John M 12 59 99.5

Louise F 12 56.3 77

Robert M 12 64.8 128

Alice F 13 56.5 84

Barbara F 13 65.3 98

Jeffrey M 13 62.5 84

Alfred M 14 69 112.5

Carol F 14 62.8 102.5

Henry M 14 63.5 102.5

Judy F 14 64.3 90

Janet F 15 62.5 112.5

Mary F 15 66.5 112

Ronald M 15 67 133

William M 15 66.5 112

Philip M 16 72 150

;

You'll notice that the first values of count1 and count3 shown in the log are 0.  Thus, if  Neil's paper is correct, that occurred during the compilation phase.

 

JVarghese
Obsidian | Level 7

First values of count1 and count3 are definitely 0, that's true. But the assignment to zero happens only at the beginning of execution.

If you go through the LPDV diagram in Neil's paper, you can see that during compilation, just a structure of LPDV is created with type and other attributes of each variable.

The initial values and even the missing values are assigned only at the beginning of execution. RETAIN statement or an accumulator variable from a sum statement gives information to the LPDV as to what should be done with these variables at the beginning and throughout the data step iterations.

Thank you,

Jojan.

art297
Opal | Level 21

Jojan,

We'll simply have to agree to disagree.  However, on an exam, I think my answer is right and your's would be marked wrong.

Maybe, someone from SAS who is familiar with what is actually going on in the two phases, will volunteer to be a referee.

Art

Reeza
Super User

I think the answer's D, but again, not great at compile/execution differentiation.

Retain is a compile time statement.

However, If you omit initial-value, the initial value is missing (docs)

SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition

Also, if you have a variable count in your dataset it may be different as well.

data nothing;

retain count;

put _all_;

run;

data nothing;

retain count;

put _all_;

count+1;

put _all_;

run;

JVarghese
Obsidian | Level 7

Thats fine.  Thank you Arthur.

art297
Opal | Level 21

While I still think we should wait for someone who really KNOWS, I have to respond to Farezza's post and, in the process, provide some more support for why I responded as I initially did.

Yes, a retain statement, without an initial value, will set the value to missing in the pdv at the compilation phase.

However, a sum statement will set the value to 0 in the pdv at the compilation phase.

I thought Neil's paper did a nice job explaining it, but here are two more definitive overviews that methinks agrees with my position:

SAS(R) 9.2 Language Reference: Concepts, Second Edition

and

The SAS Supervisor - sasCommunity

Regardless, I think that you will find both to be excellent reads.

Reeza
Super User

Is it just sum or any other implicit retain statement?

art297
Opal | Level 21

Depends upon what you are asking.  Set , merge and temporary arrays all implicitly retain variables, but a sum statement causes an implicit retain with an initial value of 0.  There may be others that I'm not aware of that have the same effect.

Sanyam
Calcite | Level 5

Thanks Arthur for a detailed explanation about this question.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 8938 views
  • 3 likes
  • 6 in conversation