Your SAS programs, embedded in web apps and elsewhere

How to get count of occurrences of a record

Reply
N/A
Posts: 0

How to get count of occurrences of a record

In a stored process I'm writing , in a data step, I want to get a count of how many times a record occurs. I might have more than one record for the same entity, so I'd like to know how many times that entity occurs.

I have code similar to the following (This is pseudocode...I'm aware it's not exactly syntactically correct):

Data Step01;
By field1 field2 field3 field4 field5;
count = 1;
If first.field5 then counter = 0;
counter+count;
run;

However, the field titled counter never gets reset to zero....it just keeps incrementing. I want it to look like this:
1
2
3
1
2
3
4
5
1
2
3
4

And that way, in a later data step, I can get just the last record, that'll have the total line count for that record.

If someone can help out with my confusion, I'd be most appreciative.
SAS Super FREQ
Posts: 8,743

Re: How to get count of occurrences of a record

Hi:
I would think that with your BY statement you would see some incrementing, but not much. When you show that single column like this:
[pre]
1
2
3
1
2
3
4
5
[/pre]
You are not taking into account the OTHER 4 BY variables.

For example, field1 is the primary, within field1, you sort by field2, then within field1 and field2, by field3, etc. so field5 is at the very end of the sort. Depending on your number of obs, you could have a new field5 on every row. Consider the following data and program:
[pre]
data info;
length petowner $10 field1 field2 field3 field4 field5 $12;
infile datalines;
input field1 $ field2 $ field3 $ field4 $ field5 $ petowner $;
return;
datalines;
cat black female indoor apartment alan
cat black female indoor apartment bob
cat black female indoor apartment carl
cat black female indoor house dave
cat black female indoor house edna
cat black female outdoor apartment fiona
cat black female outdoor house george
cat black male indoor apartment harry
cat black male indoor apartment iona
cat black male indoor apartment jack
cat black male outdoor apartment kathy
cat black male outdoor apartment louise
cat black male outdoor house mary
cat white female indoor apartment nora
cat white female indoor house otis
cat white male indoor apartment peter
cat white male indoor apartment quentin
cat white male indoor house rob
dog brown female indoor apartment steve
dog brown female outdoor house talia
dog brown female outdoor house una
dog spots female indoor house victor
dog spots female outdoor house william
dog spots male indoor house xavier
dog white female outdoor house yarrow
dog white male indoor apartment zach
;
run;

proc sort data=info out=info;
by field1 field2 field3 field4 field5;
run;

data counters;
set info;
by field1 field2 field3 field4 field5;
retain f1cnt f2cnt f3cnt f4cnt f5cnt totcnt;
if _n_ = 1 then totcnt = 0;
if first.field1 then f1cnt=0;
if first.field2 then f2cnt=0;
if first.field3 then f3cnt=0;
if first.field4 then f4cnt=0;
if first.field5 then f5cnt=0;
totcnt + 1;
f1cnt + 1;
f2cnt + 1;
f3cnt + 1;
f4cnt + 1;
f5cnt + 1;
run;

proc print data=counters;
var field1 f1cnt field2 f2cnt field3 f3cnt field4 f4cnt field5 f5cnt totcnt petowner ;
run;
[/pre]

The output from the above program is shown below:
[pre]


Obs field1 f1cnt field2 f2cnt field3 f3cnt field4 f4cnt field5 f5cnt totcnt petowner

1 cat 1 black 1 female 1 indoor 1 apartment 1 1 alan
2 cat 2 black 2 female 2 indoor 2 apartment 2 2 bob
3 cat 3 black 3 female 3 indoor 3 apartment 3 3 carl
4 cat 4 black 4 female 4 indoor 4 house 1 4 dave
5 cat 5 black 5 female 5 indoor 5 house 2 5 edna
6 cat 6 black 6 female 6 outdoor 1 apartment 1 6 fiona
7 cat 7 black 7 female 7 outdoor 2 house 1 7 george
8 cat 8 black 8 male 1 indoor 1 apartment 1 8 harry
9 cat 9 black 9 male 2 indoor 2 apartment 2 9 iona
10 cat 10 black 10 male 3 indoor 3 apartment 3 10 jack
11 cat 11 black 11 male 4 outdoor 1 apartment 1 11 kathy
12 cat 12 black 12 male 5 outdoor 2 apartment 2 12 louise
13 cat 13 black 13 male 6 outdoor 3 house 1 13 mary
14 cat 14 white 1 female 1 indoor 1 apartment 1 14 nora
15 cat 15 white 2 female 2 indoor 2 house 1 15 otis
16 cat 16 white 3 male 1 indoor 1 apartment 1 16 peter
17 cat 17 white 4 male 2 indoor 2 apartment 2 17 quentin
18 cat 18 white 5 male 3 indoor 3 house 1 18 rob
19 dog 1 brown 1 female 1 indoor 1 apartment 1 19 steve
20 dog 2 brown 2 female 2 outdoor 1 house 1 20 talia
21 dog 3 brown 3 female 3 outdoor 2 house 2 21 una
22 dog 4 spots 1 female 1 indoor 1 house 1 22 victor
23 dog 5 spots 2 female 2 outdoor 1 house 1 23 william
24 dog 6 spots 3 male 1 indoor 1 house 1 24 xavier
25 dog 7 white 1 female 1 outdoor 1 house 1 25 yarrow
26 dog 8 white 2 male 1 indoor 1 apartment 1 26 zach
[/pre]

Note how totcnt correctly shows 26 and the f1cnt variable shows 18 cats and 8 dogs. Then f2cnt shows that you have 13 black cats and 5 white cats; 3 brown dogs, 3 dogs with spots and 2 white dogs -- so f2cnt variable is changing, but WITHIN field1. So by the time you get to field5 (whether the petowner lives in a house or apartment), then the count changes based on sort order for the 4 variables (field1, field2, field3, field4) that preceded field5 in the BY statement.

If you wanted the count of petowners in houses vs petowners in apartments, then you'd have to change the BY statement in your SORT and in your PROGRAM. You say you want to get a total line count for a "record" assuming there are duplicates for a unique set of BY variables.

For example, if you had true duplicate rows then you might get completely different counts, depending on your BY variables. Run this from a code node in EG to see the difference between what's above and with different data (with duplicate "records" except for field2):
[pre]

data infodups;
length field1 $12 field2 $4 field3 field4 field5 $12 type wherelive $10;
infile datalines;
input field1 $ field2 $ field3 $ field4 $ field5 $ type $ wherelive;
return;
datalines;
alan 1982 cat black female indoor apartment
alan 1984 cat black female indoor apartment
alan 1996 cat black female indoor apartment
alan 1998 cat black female indoor apartment
alan 2006 cat black female indoor apartment
bob 1985 cat black female indoor apartment
bob 1987 cat black female indoor apartment
bob 1989 cat black female indoor apartment
bob 2000 cat black female indoor apartment
bob 2004 cat black female indoor apartment
bob 2007 cat black female indoor apartment
carl 1992 cat black female indoor apartment
carl 1994 cat black female indoor apartment
carl 1996 cat black female indoor apartment
carl 1999 cat black female indoor apartment
carl 2001 cat black female indoor apartment
carl 2007 cat black female indoor apartment
;
run;

proc sort data=infodups out=infodups;
by field1 field2 field3 field4 field5;
run;

data diffcntr;
set infodups;
by field1 field2 field3 field4 field5;
retain f1cnt f2cnt f3cnt f4cnt f5cnt totcnt;
if _n_ = 1 then totcnt = 0;
if first.field1 then f1cnt=0;
if first.field2 then f2cnt=0;
if first.field3 then f3cnt=0;
if first.field4 then f4cnt=0;
if first.field5 then f5cnt=0;
totcnt + 1;
f1cnt + 1;
f2cnt + 1;
f3cnt + 1;
f4cnt + 1;
f5cnt + 1;
run;

proc print data=diffcntr;
title 'With Real Duplicate Records';
var field1 f1cnt field2 f2cnt field3 f3cnt field4 f4cnt field5 f5cnt totcnt type wherelive ;
run;

[/pre]

You might consider contacting Tech Support, because there might be another way to accomplish what you want to do with your stored process...depending on what it is that you want to do.

cynthia
N/A
Posts: 0

Re: How to get count of occurrences of a record

Hi, Cynthia,

What a lengthy response! Yes, I was aware of the possible permutation of all the BY variables.... however, all I wanted was a count of the duplicated records, which at this moment in my program's evolution is the simplest count to get. I'll read and re-read your reply and study it also.

Thanks for replying!
Contributor
Posts: 49

Re: How to get count of occurrences of a record

You could use a simple PROC FREQ. That does the counts for you. Why re-invent the wheel?

Or you could use PROC SQL.

Something along the lines of:
PROC SQL;
SELECT DISTINCT subject, COUNT(field5) AS count
FROM data
GROUP BY subject;
QUIT;

This will give you the total number of records for FIELD5 for each subject.
Ask a Question
Discussion stats
  • 3 replies
  • 5044 views
  • 1 like
  • 3 in conversation