BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14

Hello

I want to generate seq number by group.

The group is defined by 4 variables.

This code is working 100%.

My question- What is the reason that no need to use retain and SAS knows to keep the value from one observation to the next observation?

 

data want;
set have;
by z r v q;
if first.q
then seq = 1;
else seq + 1;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Once the sum statement is used for a (numeric) variable anywhere in a DATA step, that variable is automatically retained and initialized to zero (unless other statements override this, e.g., a RETAIN statement specifying a different initial value). These changes to the default behavior (i.e., no retaining, initialization with a missing value) occur at compile time. They are effective even if the sum statement is never executed, as in the artificial example below:

data test;
set sashelp.class;
seq=seq+100;
if 0 then seq+42;
run;

Result (abbreviated PROC PRINT output for dataset TEST):

Obs    Name       Sex    Age    Height    Weight     seq

  1    Alfred      M      14     69.0      112.5     100
  2    Alice       F      13     56.5       84.0     200
  3    Barbara     F      13     65.3       98.0     300
  .
  .  
  . 
 18    Thomas      M      11     57.5       85.0    1800
 19    William     M      15     66.5      112.0    1900

 

Without the sum statement seq+42; -- although never executed due to the IF condition -- variable seq would be missing in all observations because adding 100 to the initial missing value would produce another missing value again and again (and a note about this in the log).

 

In your last example (where the parentheses around the UNTIL condition are missing)

data want;
do seq=1 by 1 until(last.ID);
  set have;
  by ID;
  output;
end;
run;

a RETAIN statement is not needed because each ID BY-group is processed within a single iteration of the DATA step. Variable seq is not retained, but set to 1 (and later incremented if a BY group has more than one observation) by the iterative DO statement in each DATA step iteration.

View solution in original post

9 REPLIES 9
FreelanceReinh
Jade | Level 19

Hello @Ronein,

 

This is a side effect of the sum statement. In the documentation see subsection "variable" and section "Comparisons".

Ronein
Meteorite | Level 14

But in my code I didnt use SUM (I used plus)

seq + 1
PaigeMiller
Diamond | Level 26

If you had clicked on the link provided by @FreelanceReinh you would see that the Sum statement is exactly what you are talking about.

--
Paige Miller
Ronein
Meteorite | Level 14

 so the rule is that SUM statement doesn't have built in Retain whereas plus have built in retain? 

PaigeMiller
Diamond | Level 26

@Ronein wrote:

 so the rule is that SUM statement doesn't have built in Retain whereas plus have built in retain? 


You are making stuff up. No one said that. There is no "plus" statement, it is the SUM statement, you are using the SUM statement. Please click on the link from @FreelanceReinh and see for yourself.

--
Paige Miller
FreelanceReinh
Jade | Level 19

You did use the sum statement. Compare your statement

seq + 1;

to the syntax or the examples in the documentation:

Syntax

Example:

total + x;

 

The sum statement is one of the rare exceptions where the name of the statement does not occur in the code. Note the difference to other uses of the plus sign in expressions as part of other statements, e.g., an assignment statement

seq = seq + 1;

a DO statement

do i=1 to seq + 1;

a WHERE statement

where x <= seq + 1;

etc.

Ronein
Meteorite | Level 14

Heecdd

 

Please look on the difference between these 2 codes:

Data have;
input ID X Y Z;
cards;
1 10 20 30
1 11 12 13
1 14 15 18
2 20 20 20 
2 20 20 40
;
Run;

data want;
set have;
by ID;
Seq+1; /****No need retain***/
if first.ID then Seq=1;
run;

data want;
set have;
retain seq;
by ID;
seq=sum(Seq,1); /****Must have retain***/
if first.ID then Seq=1;
run;

Here are the ways I found to calculate seq by group

Data have;
input ID X Y Z;
cards;
1 10 20 30
1 11 12 13
1 14 15 18
2 20 20 20 
2 20 20 40
;
Run;

data want;
set have;
by ID;
Seq+1; /****No need retain***/
if first.ID then Seq=1;
run;

data want;
set have;
retain seq;
by ID;
seq=sum(Seq,1); /****Must have retain***/
if first.ID then Seq=1;
run;


data want;
set have;
by ID;
if first.ID then seq = 1;
else seq + 1;
run;
/****No need retain***/


data want;
set have;
by ID;
retain seq;
if first.ID then seq=0;
seq=sum(seq,1); 
run;
/****Must have retain***/


data want;
do seq=1 by 1 until last.ID;
set have;
by ID;
output;
end;
run;
/****No need retain***/


  
PaigeMiller
Diamond | Level 26

You are now confusing the SUM function with the SUM statement. These are not the same. The SUM function does require a RETAIN, again as clearly stated at the link from @FreelanceReinh which you should really look at. The SUM statement does not require a RETAIN, as clearly stated at the link.

--
Paige Miller
FreelanceReinh
Jade | Level 19

Once the sum statement is used for a (numeric) variable anywhere in a DATA step, that variable is automatically retained and initialized to zero (unless other statements override this, e.g., a RETAIN statement specifying a different initial value). These changes to the default behavior (i.e., no retaining, initialization with a missing value) occur at compile time. They are effective even if the sum statement is never executed, as in the artificial example below:

data test;
set sashelp.class;
seq=seq+100;
if 0 then seq+42;
run;

Result (abbreviated PROC PRINT output for dataset TEST):

Obs    Name       Sex    Age    Height    Weight     seq

  1    Alfred      M      14     69.0      112.5     100
  2    Alice       F      13     56.5       84.0     200
  3    Barbara     F      13     65.3       98.0     300
  .
  .  
  . 
 18    Thomas      M      11     57.5       85.0    1800
 19    William     M      15     66.5      112.0    1900

 

Without the sum statement seq+42; -- although never executed due to the IF condition -- variable seq would be missing in all observations because adding 100 to the initial missing value would produce another missing value again and again (and a note about this in the log).

 

In your last example (where the parentheses around the UNTIL condition are missing)

data want;
do seq=1 by 1 until(last.ID);
  set have;
  by ID;
  output;
end;
run;

a RETAIN statement is not needed because each ID BY-group is processed within a single iteration of the DATA step. Variable seq is not retained, but set to 1 (and later incremented if a BY group has more than one observation) by the iterative DO statement in each DATA step iteration.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 441 views
  • 1 like
  • 3 in conversation