BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Fluorite | Level 6

## Count observations in BY group

INPUT
ID               Item code     Profile
000000196 212543179   1
000000270 216630523   1
000000346 217007122   1
000000395 000021678   1
000000752 000294850   1
000001277 000000344   1
000001277 000000546   1

I want to have extra two columns, one is Count, which calculates how many times the same ID number appears in the list. Want to be use Profile divided by Count.

OUTPUT
ID                Item code        Profile Count Want
000000196 212543179          1 1 1
000000270 216630523          1 1 1
000000346 217007122          1 1 1
000000395 000021678          1 1 1
000000752 000294850          1 1 1
000001277 000000344          1 2 0.5
000001277 000000546          1 2 0.5

1 ACCEPTED SOLUTION

Accepted Solutions
PROC Star

## Re: Count observations in BY group

If your data are sorted by ID, then:

``````data have;
infile cards truncover;
input   ID  :\$10.  Item_code  :\$10.  Profile ;
cards;
000000196 212543179   1
000000270 216630523   1
000000346 217007122   1
000000395 000021678   1
000000752 000294850   1
000001277 000000344   1
000001277 000000546   1
;

data want;
set have;
by id;
if last.id;                 /*Go back if not last.id     */
count=_n_-sum(lag(_n_),0);  /*At last.id, calculate COUNT*/
do _n_=1 to count;          /*Reread COUNT observations  */
set have;
want=profile/count;
output;
end;
run;``````

The data step only drops down to the calculation of count when the end of a by-group is encountered.  This means that the lag(_n_) inside that calculation is only updated at the end of each successive by-group (the first lag(_n_) would be missing, so I add a zero).  Remember the lag function is not a lookback like in excel.  It's an update of a queued value, which is retained in memory across observations, until replaced by another invocation of the same lag function - i.e. at the end of the next by group.

Then once the count value is established, a loop iterated COUNT times issues a second set statement, re-reading the same observations, and outputs them, with the new count and want variables.  Note that each SET HAVE statement generates a separate stream of input from HAVE.  In this case the two SET's are loosely synchronized.  The first SET reads one observation at a time, while the second reads in bunches, of size count.

Note even though you are reading the dataset twice, you are NOT doubling the amount of data transfer from the disk file.  Almost certainly the second SET statement is merely reading from a cache in memory generated by the first SET.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
6 REPLIES 6
Tourmaline | Level 20

## Re: Count observations in BY group

Not easily readable

Fluorite | Level 6

## Re: Count observations in BY group

Sorry I just reformat it, some how it got crumbled up

Tourmaline | Level 20

## Re: Count observations in BY group

``````
data have;
infile cards truncover;
input   ID     : \$20.           Item_code  : \$20.  Profile ;
cards;
000000196 212543179   1
000000270 216630523   1
000000346 217007122   1
000000395 000021678   1
000000752 000294850   1
000001277 000000344   1
000001277 000000546   1
;

proc sql;
create table want as
select *,count(id) as count, profile/calculated count as want
from have
group by id;
quit;
``````
PROC Star

## Re: Count observations in BY group

If your data are sorted by ID, then:

``````data have;
infile cards truncover;
input   ID  :\$10.  Item_code  :\$10.  Profile ;
cards;
000000196 212543179   1
000000270 216630523   1
000000346 217007122   1
000000395 000021678   1
000000752 000294850   1
000001277 000000344   1
000001277 000000546   1
;

data want;
set have;
by id;
if last.id;                 /*Go back if not last.id     */
count=_n_-sum(lag(_n_),0);  /*At last.id, calculate COUNT*/
do _n_=1 to count;          /*Reread COUNT observations  */
set have;
want=profile/count;
output;
end;
run;``````

The data step only drops down to the calculation of count when the end of a by-group is encountered.  This means that the lag(_n_) inside that calculation is only updated at the end of each successive by-group (the first lag(_n_) would be missing, so I add a zero).  Remember the lag function is not a lookback like in excel.  It's an update of a queued value, which is retained in memory across observations, until replaced by another invocation of the same lag function - i.e. at the end of the next by group.

Then once the count value is established, a loop iterated COUNT times issues a second set statement, re-reading the same observations, and outputs them, with the new count and want variables.  Note that each SET HAVE statement generates a separate stream of input from HAVE.  In this case the two SET's are loosely synchronized.  The first SET reads one observation at a time, while the second reads in bunches, of size count.

Note even though you are reading the dataset twice, you are NOT doubling the amount of data transfer from the disk file.  Almost certainly the second SET statement is merely reading from a cache in memory generated by the first SET.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tourmaline | Level 20

## Re: Count observations in BY group

love this -" Almost certainly the second SET statement is merely reading from a cache in memory generated by the first SET."

Never knew mechanics but a lot better now

Super User

## Re: Count observations in BY group

Perhaps

```data have;
informat ID  Itemcode \$10.;
INPUT ID  Itemcode     Profile   ;
datalines;
000000196 212543179   1
000000270 216630523   1
000000346 217007122   1
000000395 000021678   1
000000752 000294850   1
000001277 000000344   1
000001277 000000546   1
;
run;

proc sql;
create table want as
select a.id, a.itemcode, a.profile,b.count,(a.profile/b.count) as want
from have as a
left join
(select id, count(*) as count
from have
group by id) as b
on a.id=b.id;
quit;
```

Note use of data step to provide example data and pasted into a code box opened with the {I} icon. Also variable name ITEMCODE as SAS does not accept spaces by default in variable names. Also notice that the variable ID and Itemcode must be character to preserve leading 0s

Since you required having a aggregate value Count merged with all of the base record ID the LEFT JOIN does that with the select bit in () creating the summary.

Discussion stats
• 6 replies
• 29714 views
• 3 likes
• 4 in conversation