I found this code somewhere in this SAS communities forum. It works perfectly fine but I am struggling to understand how it works without the retain statement.
I thought seq_id would be set to null/zero in every iteration so seq+1 would not add up. Could somebody explain how this works.
data test;
input pid $ date $;
cards;
1 1/1/2011
1 1/1/2011
1 3/4/2011
2 5/1/2010
2 6/3/2010
;
run;
proc sort data=test;
by pid date;
run;
data want;
set test ;
by date;
if first.date then seq_id=1;
else seq_id+1;
run;
Hi @nickspencer
In your data step, seq_id + 1 in a sum statement, which is one of the few SAS statements that do not begin with a keyword.
data want;
set test ;
by date;
if first.date then seq_id=1;
else seq_id+1; <- sum Statement
run;
It is used to add the result of an expression on the right side of the '+' (here: 1) to a numeric accumulator variable on the left side of the '+' (here= seq_id).
The syntax is the following: accumulator variable + expression ;
The accumulator variable is automatically set to 0 before the first observation is read. Its value is then automatically retained for the next iteration.
So in your code:
SAS groups observation by date (BY statement).
At the very first observation of each group (identified by the internal variable first.date, which takes the value 1 in this case), seq_id is set to 1.
For all the next observations of the same date, the condition 'if first.date' is false so SAS applies the 'else' statement, which results in the accumulation of seq_id's previous value + 1 -> so 2 at the second iteration, 3 at the third, ... (the PDV is not reinitialized for this variable, ie the previous value is retained)
Hope this helps.
Best,
It uses the SUM statement which includes an implicit retain, so there is a retain, it's just not explicit.
This is the SUM statement
seq_id+1
The sum statement is equivalent to using the SUM function and the RETAIN statement, as shown here:
retain variable 0;
variable=sum(variable,expression);
Hi @nickspencer
In your data step, seq_id + 1 in a sum statement, which is one of the few SAS statements that do not begin with a keyword.
data want;
set test ;
by date;
if first.date then seq_id=1;
else seq_id+1; <- sum Statement
run;
It is used to add the result of an expression on the right side of the '+' (here: 1) to a numeric accumulator variable on the left side of the '+' (here= seq_id).
The syntax is the following: accumulator variable + expression ;
The accumulator variable is automatically set to 0 before the first observation is read. Its value is then automatically retained for the next iteration.
So in your code:
SAS groups observation by date (BY statement).
At the very first observation of each group (identified by the internal variable first.date, which takes the value 1 in this case), seq_id is set to 1.
For all the next observations of the same date, the condition 'if first.date' is false so SAS applies the 'else' statement, which results in the accumulation of seq_id's previous value + 1 -> so 2 at the second iteration, 3 at the third, ... (the PDV is not reinitialized for this variable, ie the previous value is retained)
Hope this helps.
Best,
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.