BookmarkSubscribeRSS Feed
CathyVI
Lapis Lazuli | Level 10

 Hello,

Am having trouble counting the number of events using proc tabulate. I wanted to keep the ID variable so I can use it to match to another dataset.

Here is my code 

 

data Have;
input id $1-7 race $8-9 age_group $10-20 event_dt sex $ visit_type $ year;
datalines;
5334933	W	02.1 <20	21-Apr-20	M	Outpatient	2020
5790999	B	02.5 35-39	4-Jan-21	M	Outpatient	2021
5848162	H	02.6 >=40	30-Aug-22	M	Outpatient	2022
5956547	W	02.3 25-29	9-Jul-20	F	Outpatient	2020
5917539	B	02.3 25-29	30-Jul-20	M	Outpatient	2020
4319890	W	02.3 25-29	9-Jun-20	M	Outpatient	2020
1002701	B	02.3 25-29	30-Sep-20	F	Outpatient	2020
5825314	Z	02.6 >=40	13-Oct-20	M	Outpatient	2020
7047568	O	02.3 25-29	23-Sep-20	M	Outpatient	2020
7205606	B	02.2 20-24	3-Feb-22	F	Outpatient	2022
;
run;

Here is my code

proc tabulate data=have missing out=want;  
class id year sex race age_group visit_type;
var event_dt; 
table (id sex age_group), (year ALL)* N ; 
format   sex $sex. visit_type $visit_type.;
run;

The processing time is taking too long so I believe something is wrong with my code. When I did not include the id variable I was able to get the count I wanted but I could not match the data to another dataset because I could not include the linkage variable id. Is there a way modify this code to include id or is there another way to count and get the same counts as  this proc tabulate. Thank you

3 REPLIES 3
SASJedi
Ammonite | Level 13

When I ran your DATA step as-is, I got the following notes repeatably in the log:

NOTE: Invalid data for event_dt in line 83 21-46.
NOTE: Invalid data for year in line 85 1-14.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0                          
85  CH5848162.H.02.6 >=40.30-Aug-22.M.Outpatient.2022                                 
    ZO33333330403323233330332476233040477767666703333222222222222222222222222222222222
    NU584816298902E60ED40930D157D229D9F5401495E492022000000000000000000000000000000000

and the HAVE data set was incomplete and incorrect:

Obs id race age_group event_dt sex visit_type year
1 5334933 W 02.1 <20 2 . 5790999 35-39 4- .
2 5956547 W 02.3 25-29 . 5917539 25-29 30 .
3 1002701 B 02.3 25-29 . 5825314 >=40 13- .

 

I modified the DATA step as follows to get it to read in all of the DATALINES values properly:

data Have;
length id $7 race $1 age_group $10 event_dt 8 sex $1 visit_type $10 year 8 _event_dt $9;
infile datalines dsd dlm='09'x;
input id race age_group  _event_dt sex $ visit_type $ year:4.;
event_dt=input(compress(_event_dt,'-'),date9.) ;
format event_dt date9.;
datalines;
5334933	W	02.1 <20	21-Apr-20	M	Outpatient	2020
5790999	B	02.5 35-39	4-Jan-21	M	Outpatient	2021
5848162	H	02.6 >=40	30-Aug-22	M	Outpatient	2022
5956547	W	02.3 25-29	9-Jul-20	F	Outpatient	2020
5917539	B	02.3 25-29	30-Jul-20	M	Outpatient	2020
4319890	W	02.3 25-29	9-Jun-20	M	Outpatient	2020
1002701	B	02.3 25-29	30-Sep-20	F	Outpatient	2020
5825314	Z	02.6 >=40	13-Oct-20	M	Outpatient	2020
7047568	O	02.3 25-29	23-Sep-20	M	Outpatient	2020
7205606	B	02.2 20-24	3-Feb-22	F	Outpatient	2022
;
run;

The WANT dataset produced looked like this:

Obs id year sex age_group _TYPE_ _PAGE_ _TABLE_ N
1 1002701 2020     1100 1 1 1
2 4319890 2020     1100 1 1 1
3 5334933 2020     1100 1 1 1
4 5790999 2021     1100 1 1 1
5 5825314 2020     1100 1 1 1
6 5848162 2022     1100 1 1 1
7 5917539 2020     1100 1 1 1
8 5956547 2020     1100 1 1 1
9 7047568 2020     1100 1 1 1
10 7205606 2022     1100 1 1 1
11 1002701 .     1000 1 1 1
12 4319890 .     1000 1 1 1
13 5334933 .     1000 1 1 1
14 5790999 .     1000 1 1 1
15 5825314 .     1000 1 1 1
16 5848162 .     1000 1 1 1
17 5917539 .     1000 1 1 1
18 5956547 .     1000 1 1 1
19 7047568 .     1000 1 1 1
20 7205606 .     1000 1 1 1
21   2020 Female   0110 1 1 2
22   2022 Female   0110 1 1 1
23   2020 Male   0110 1 1 5
24   2021 Male   0110 1 1 1
25   2022 Male   0110 1 1 1
26   . Female   0010 1 1 3
27   . Male   0010 1 1 7
28   2020   02.1 <20 0101 1 1 1
29   2022   02.2 20-24 0101 1 1 1
30   2020   02.3 25-29 0101 1 1 5
31   2021   02.5 35-39 0101 1 1 1
32   2020   02.6 >=40 0101 1 1 1
33   2022   02.6 >=40 0101 1 1 1
34   .   02.1 <20 0001 1 1 1
35   .   02.2 20-24 0001 1 1 1
36   .   02.3 25-29 0001 1 1 5
37   .   02.5 35-39 0001 1 1 1
38   .   02.6 >=40 0001 1 1 2

 

I'm not seeing how this output data would be useful in a join with another data set. Can you provide an example of the other dataset you want to join with this data, and a sample of the desired end result?

 

 

Check out my Jedi SAS Tricks for SAS Users
Kathryn_SAS
SAS Employee

Can you clarify what count you want and how you want this added back to the original data? For example, do you want the total count of Males for a particular year added to the id that matches that sex and year? If you can show what you are expecting to get based on the data you sent, that would be helpful.

Tom
Super User Tom
Super User

Why are you using a REPORTING procedure?  PROC TABULATE is for designed printed reports (what everyone used to mean by the word TABLE).  It is not really designed for making DATASETS. And the dataset it does produce are really only useful for making more reports, not for analysis.

 

If you want the results in a dataset use an ANALYSIS procedure like PROC SUMMARY instead.

 

If there are too many levels of ID to use it as a CLASS variable try sorting the dataset by ID first and use ID as a BY variable instead.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 307 views
  • 1 like
  • 4 in conversation