BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tommer
Obsidian | Level 7

Hello, I am trying to generate same values for by group ie.

abc     1 
abc    1
abc    1
def    2
def    2
xyz   3
xyz   3

 

I used the following code:

data dt;
input dat $;
datalines;

 

abc

abc

xyz

abc

def

xyz

def
;
run;

proc sort data=dt;
by dat;
run;


data dt2;
set dt;
by dat;
retain ord;
if first.dat then ord+1;
run;

 

It does work well, but I was wondering since I used just first.dat, I was thinking the ord+1 will only apply to first value per the by group. However, it showed up for all values. Is this because of retain statement? I understand retain statement holds the value but how is it applying the ord+1 for values which are not first in the by group? Please help.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

ord+1 gets only executed when the condition is true. That's why ord has the same value within your by group.

And yes, ord is not missing for the 2nd to nth value within a by group because it is retained. In your code you could even remove the Retain statement and the value would still be retained because SAS syntax ord+1 also implicitly leads to a retain. ...but: If the value wouldn't be retained then your counter wouldn't work at all as in the first case you would add +1 to a missing.

Also to be considered: Depending on the syntax adding +1 to a missing value returns either missing or 1.

 

Here the result from multiple syntax variations.

data dt2;
  set dt;
  by dat;
  retain ord1 ord2 ord3;
  retain ord2_B 1;
  if first.dat then 
    do;
      ord1+1;
      ord2=ord2+1;
      ord2_B=ord2_B+1;
      ord3=sum(ord3,1);
      ord4+1;
      ord5=ord5+1;
      ord6=sum(ord6,1);
    end;
run;

proc print data=dt2;
run;

Patrick_0-1593131310987.png

 

 

  

View solution in original post

5 REPLIES 5
Patrick
Opal | Level 21

ord+1 gets only executed when the condition is true. That's why ord has the same value within your by group.

And yes, ord is not missing for the 2nd to nth value within a by group because it is retained. In your code you could even remove the Retain statement and the value would still be retained because SAS syntax ord+1 also implicitly leads to a retain. ...but: If the value wouldn't be retained then your counter wouldn't work at all as in the first case you would add +1 to a missing.

Also to be considered: Depending on the syntax adding +1 to a missing value returns either missing or 1.

 

Here the result from multiple syntax variations.

data dt2;
  set dt;
  by dat;
  retain ord1 ord2 ord3;
  retain ord2_B 1;
  if first.dat then 
    do;
      ord1+1;
      ord2=ord2+1;
      ord2_B=ord2_B+1;
      ord3=sum(ord3,1);
      ord4+1;
      ord5=ord5+1;
      ord6=sum(ord6,1);
    end;
run;

proc print data=dt2;
run;

Patrick_0-1593131310987.png

 

 

  

Tommer
Obsidian | Level 7
Thanks! I applied the same logic of retain to do this where I want the same num values for each by group to be populated under ord. But now I don't see the num being carried over. What am I missing?

data dt;
infile datalines missover;
input dat $ num;
datalines;
abc 1
abc
xyz 2
abc
def 3
xyz
def
;
run;

proc sort data=dt;
by dat;
run;


data dt2;
set dt;
by dat;
retain num;
if first.dat then ord=num;
run;
Astounding
PROC Star

You are missing the fact that you added NUM to  your incoming data set.  That means each time the SET statement reads in an observation, it replaces the previous value for NUM.  If the new value is missing, it replaces a nonmissing value.  You should not have NUM in your incoming data to apply this logic (which is otherwise perfectly good logic).

mkeintz
PROC Star

As @Astounding noted, the fact that NUM in in the input dataset means that any RETAIN instruction you issued is superseded by the fact that NUM is read in for each iteration of the DATA set. 

 

And in fact, you don't even need a retain statement for variables that are read via SET or MERGE - they are automatically retained, but it's not usually apparent.

 

You can use this attribute of SAS and produce the result you are looking for by using a conditional SET statement for the NUM variable, as in:

 

data dt;
  infile datalines missover;
  input dat $ num;
datalines;
abc 1
abc
xyz 2
abc
def 3
xyz
def
run;

proc sort data=dt;
  by dat;
run;


data dt2;
  set dt (drop=num);
  by dat;
  if first.dat then set dt (keep=num) point=_n_;
run;

This depends on two properties.

  1.   The proc sort retains the original record order within by groups.  And since you issue a NUM value for the first record of each DAT value in the original unsorted data set, the sorted dataset will have the desired NUM at the beginning of each by group.

  2. The unconditional SET statement doesn't read in NUM, but the subsequent conditional SET (if first.dat then set ...) does read in NUM.  This value will be automatically retained until the next time the condition is satisfied.

 

It's not particularly recommended for you problem, but this conditional SET technique is very useful for "bringing history forward".  You can read in, say, chronologically sorted monthly data set, a quarterly data set, and a yearly data set, such that the most recent yearly and quarterly values are brought forward to each monthly record.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tommer
Obsidian | Level 7
@Patrick these are all great examples with really good scenarios to understand the concepts better!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1140 views
  • 3 likes
  • 4 in conversation