- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello, I am trying to generate same values for by group ie.
abc 1
abc 1
abc 1
def 2
def 2
xyz 3
xyz 3
I used the following code:
data dt;
input dat $;
datalines;
abc
abc
xyz
abc
def
xyz
def
;
run;
proc sort data=dt;
by dat;
run;
data dt2;
set dt;
by dat;
retain ord;
if first.dat then ord+1;
run;
It does work well, but I was wondering since I used just first.dat, I was thinking the ord+1 will only apply to first value per the by group. However, it showed up for all values. Is this because of retain statement? I understand retain statement holds the value but how is it applying the ord+1 for values which are not first in the by group? Please help.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ord+1 gets only executed when the condition is true. That's why ord has the same value within your by group.
And yes, ord is not missing for the 2nd to nth value within a by group because it is retained. In your code you could even remove the Retain statement and the value would still be retained because SAS syntax ord+1 also implicitly leads to a retain. ...but: If the value wouldn't be retained then your counter wouldn't work at all as in the first case you would add +1 to a missing.
Also to be considered: Depending on the syntax adding +1 to a missing value returns either missing or 1.
Here the result from multiple syntax variations.
data dt2;
set dt;
by dat;
retain ord1 ord2 ord3;
retain ord2_B 1;
if first.dat then
do;
ord1+1;
ord2=ord2+1;
ord2_B=ord2_B+1;
ord3=sum(ord3,1);
ord4+1;
ord5=ord5+1;
ord6=sum(ord6,1);
end;
run;
proc print data=dt2;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ord+1 gets only executed when the condition is true. That's why ord has the same value within your by group.
And yes, ord is not missing for the 2nd to nth value within a by group because it is retained. In your code you could even remove the Retain statement and the value would still be retained because SAS syntax ord+1 also implicitly leads to a retain. ...but: If the value wouldn't be retained then your counter wouldn't work at all as in the first case you would add +1 to a missing.
Also to be considered: Depending on the syntax adding +1 to a missing value returns either missing or 1.
Here the result from multiple syntax variations.
data dt2;
set dt;
by dat;
retain ord1 ord2 ord3;
retain ord2_B 1;
if first.dat then
do;
ord1+1;
ord2=ord2+1;
ord2_B=ord2_B+1;
ord3=sum(ord3,1);
ord4+1;
ord5=ord5+1;
ord6=sum(ord6,1);
end;
run;
proc print data=dt2;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data dt;
infile datalines missover;
input dat $ num;
datalines;
abc 1
abc
xyz 2
abc
def 3
xyz
def
;
run;
proc sort data=dt;
by dat;
run;
data dt2;
set dt;
by dat;
retain num;
if first.dat then ord=num;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You are missing the fact that you added NUM to your incoming data set. That means each time the SET statement reads in an observation, it replaces the previous value for NUM. If the new value is missing, it replaces a nonmissing value. You should not have NUM in your incoming data to apply this logic (which is otherwise perfectly good logic).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
As @Astounding noted, the fact that NUM in in the input dataset means that any RETAIN instruction you issued is superseded by the fact that NUM is read in for each iteration of the DATA set.
And in fact, you don't even need a retain statement for variables that are read via SET or MERGE - they are automatically retained, but it's not usually apparent.
You can use this attribute of SAS and produce the result you are looking for by using a conditional SET statement for the NUM variable, as in:
data dt;
infile datalines missover;
input dat $ num;
datalines;
abc 1
abc
xyz 2
abc
def 3
xyz
def
run;
proc sort data=dt;
by dat;
run;
data dt2;
set dt (drop=num);
by dat;
if first.dat then set dt (keep=num) point=_n_;
run;
This depends on two properties.
- The proc sort retains the original record order within by groups. And since you issue a NUM value for the first record of each DAT value in the original unsorted data set, the sorted dataset will have the desired NUM at the beginning of each by group.
- The unconditional SET statement doesn't read in NUM, but the subsequent conditional SET (if first.dat then set ...) does read in NUM. This value will be automatically retained until the next time the condition is satisfied.
It's not particularly recommended for you problem, but this conditional SET technique is very useful for "bringing history forward". You can read in, say, chronologically sorted monthly data set, a quarterly data set, and a yearly data set, such that the most recent yearly and quarterly values are brought forward to each monthly record.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content