Re: Find the count using proc tabulate

CathyVI · Posted 07-15-2025 08:49 AM

Hello,

Am having trouble counting the number of events using proc tabulate. I wanted to keep the ID variable so I can use it to match to another dataset.

Here is my code

data Have;
input id $1-7 race $8-9 age_group $10-20 event_dt sex $ visit_type $ year;
datalines;
5334933	W	02.1 <20	21-Apr-20	M	Outpatient	2020
5790999	B	02.5 35-39	4-Jan-21	M	Outpatient	2021
5848162	H	02.6 >=40	30-Aug-22	M	Outpatient	2022
5956547	W	02.3 25-29	9-Jul-20	F	Outpatient	2020
5917539	B	02.3 25-29	30-Jul-20	M	Outpatient	2020
4319890	W	02.3 25-29	9-Jun-20	M	Outpatient	2020
1002701	B	02.3 25-29	30-Sep-20	F	Outpatient	2020
5825314	Z	02.6 >=40	13-Oct-20	M	Outpatient	2020
7047568	O	02.3 25-29	23-Sep-20	M	Outpatient	2020
7205606	B	02.2 20-24	3-Feb-22	F	Outpatient	2022
;
run;

Here is my code

proc tabulate data=have missing out=want;  
class id year sex race age_group visit_type;
var event_dt; 
table (id sex age_group), (year ALL)* N ; 
format   sex $sex. visit_type $visit_type.;
run;

The processing time is taking too long so I believe something is wrong with my code. When I did not include the id variable I was able to get the count I wanted but I could not match the data to another dataset because I could not include the linkage variable id. Is there a way modify this code to include id or is there another way to count and get the same counts as this proc tabulate. Thank you

SASJedi · Posted 07-15-2025 11:56 AM

When I ran your DATA step as-is, I got the following notes repeatably in the log:

NOTE: Invalid data for event_dt in line 83 21-46.
NOTE: Invalid data for year in line 85 1-14.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                          
85  CH5848162.H.02.6 >=40.30-Aug-22.M.Outpatient.2022                                 
    ZO33333330403323233330332476233040477767666703333222222222222222222222222222222222
    NU584816298902E60ED40930D157D229D9F5401495E492022000000000000000000000000000000000

and the HAVE data set was incomplete and incorrect:

Obs	id	race	age_group	event_dt	sex	visit_type	year
1	5334933	W	02.1 <20 2	.	5790999	35-39 4-	.
2	5956547	W	02.3 25-29	.	5917539	25-29 30	.
3	1002701	B	02.3 25-29	.	5825314	>=40 13-	.

I modified the DATA step as follows to get it to read in all of the DATALINES values properly:

data Have;
length id $7 race $1 age_group $10 event_dt 8 sex $1 visit_type $10 year 8 _event_dt $9;
infile datalines dsd dlm='09'x;
input id race age_group  _event_dt sex $ visit_type $ year:4.;
event_dt=input(compress(_event_dt,'-'),date9.) ;
format event_dt date9.;
datalines;
5334933	W	02.1 <20	21-Apr-20	M	Outpatient	2020
5790999	B	02.5 35-39	4-Jan-21	M	Outpatient	2021
5848162	H	02.6 >=40	30-Aug-22	M	Outpatient	2022
5956547	W	02.3 25-29	9-Jul-20	F	Outpatient	2020
5917539	B	02.3 25-29	30-Jul-20	M	Outpatient	2020
4319890	W	02.3 25-29	9-Jun-20	M	Outpatient	2020
1002701	B	02.3 25-29	30-Sep-20	F	Outpatient	2020
5825314	Z	02.6 >=40	13-Oct-20	M	Outpatient	2020
7047568	O	02.3 25-29	23-Sep-20	M	Outpatient	2020
7205606	B	02.2 20-24	3-Feb-22	F	Outpatient	2022
;
run;

The WANT dataset produced looked like this:

Obs	id	year	sex	age_group	_TYPE_	_PAGE_	_TABLE_	N
1	1002701	2020			1100	1	1	1
2	4319890	2020			1100	1	1	1
3	5334933	2020			1100	1	1	1
4	5790999	2021			1100	1	1	1
5	5825314	2020			1100	1	1	1
6	5848162	2022			1100	1	1	1
7	5917539	2020			1100	1	1	1
8	5956547	2020			1100	1	1	1
9	7047568	2020			1100	1	1	1
10	7205606	2022			1100	1	1	1
11	1002701	.			1000	1	1	1
12	4319890	.			1000	1	1	1
13	5334933	.			1000	1	1	1
14	5790999	.			1000	1	1	1
15	5825314	.			1000	1	1	1
16	5848162	.			1000	1	1	1
17	5917539	.			1000	1	1	1
18	5956547	.			1000	1	1	1
19	7047568	.			1000	1	1	1
20	7205606	.			1000	1	1	1
21		2020	Female		0110	1	1	2
22		2022	Female		0110	1	1	1
23		2020	Male		0110	1	1	5
24		2021	Male		0110	1	1	1
25		2022	Male		0110	1	1	1
26		.	Female		0010	1	1	3
27		.	Male		0010	1	1	7
28		2020		02.1 <20	0101	1	1	1
29		2022		02.2 20-24	0101	1	1	1
30		2020		02.3 25-29	0101	1	1	5
31		2021		02.5 35-39	0101	1	1	1
32		2020		02.6 >=40	0101	1	1	1
33		2022		02.6 >=40	0101	1	1	1
34		.		02.1 <20	0001	1	1	1
35		.		02.2 20-24	0001	1	1	1
36		.		02.3 25-29	0001	1	1	5
37		.		02.5 35-39	0001	1	1	1
38		.		02.6 >=40	0001	1	1	2

I'm not seeing how this output data would be useful in a join with another data set. Can you provide an example of the other dataset you want to join with this data, and a sample of the desired end result?

Check out my Jedi SAS Tricks for SAS Users

Kathryn_SAS · Posted 07-15-2025 11:58 AM

Can you clarify what count you want and how you want this added back to the original data? For example, do you want the total count of Males for a particular year added to the id that matches that sex and year? If you can show what you are expecting to get based on the data you sent, that would be helpful.

Tom · Posted 07-15-2025 02:10 PM

Why are you using a REPORTING procedure? PROC TABULATE is for designed printed reports (what everyone used to mean by the word TABLE). It is not really designed for making DATASETS. And the dataset it does produce are really only useful for making more reports, not for analysis.

If you want the results in a dataset use an ANALYSIS procedure like PROC SUMMARY instead.

If there are too many levels of ID to use it as a CLASS variable try sorting the dataset by ID first and use ID as a BY variable instead.

Find the count using proc tabulate