Hello,
Am having trouble counting the number of events using proc tabulate. I wanted to keep the ID variable so I can use it to match to another dataset.
Here is my code
data Have;
input id $1-7 race $8-9 age_group $10-20 event_dt sex $ visit_type $ year;
datalines;
5334933 W 02.1 <20 21-Apr-20 M Outpatient 2020
5790999 B 02.5 35-39 4-Jan-21 M Outpatient 2021
5848162 H 02.6 >=40 30-Aug-22 M Outpatient 2022
5956547 W 02.3 25-29 9-Jul-20 F Outpatient 2020
5917539 B 02.3 25-29 30-Jul-20 M Outpatient 2020
4319890 W 02.3 25-29 9-Jun-20 M Outpatient 2020
1002701 B 02.3 25-29 30-Sep-20 F Outpatient 2020
5825314 Z 02.6 >=40 13-Oct-20 M Outpatient 2020
7047568 O 02.3 25-29 23-Sep-20 M Outpatient 2020
7205606 B 02.2 20-24 3-Feb-22 F Outpatient 2022
;
run;
Here is my code
proc tabulate data=have missing out=want;
class id year sex race age_group visit_type;
var event_dt;
table (id sex age_group), (year ALL)* N ;
format sex $sex. visit_type $visit_type.;
run;
The processing time is taking too long so I believe something is wrong with my code. When I did not include the id variable I was able to get the count I wanted but I could not match the data to another dataset because I could not include the linkage variable id. Is there a way modify this code to include id or is there another way to count and get the same counts as this proc tabulate. Thank you
When I ran your DATA step as-is, I got the following notes repeatably in the log:
NOTE: Invalid data for event_dt in line 83 21-46. NOTE: Invalid data for year in line 85 1-14. RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 85 CH5848162.H.02.6 >=40.30-Aug-22.M.Outpatient.2022 ZO33333330403323233330332476233040477767666703333222222222222222222222222222222222 NU584816298902E60ED40930D157D229D9F5401495E492022000000000000000000000000000000000
and the HAVE data set was incomplete and incorrect:
Obs | id | race | age_group | event_dt | sex | visit_type | year |
---|---|---|---|---|---|---|---|
1 | 5334933 | W | 02.1 <20 2 | . | 5790999 | 35-39 4- | . |
2 | 5956547 | W | 02.3 25-29 | . | 5917539 | 25-29 30 | . |
3 | 1002701 | B | 02.3 25-29 | . | 5825314 | >=40 13- | . |
I modified the DATA step as follows to get it to read in all of the DATALINES values properly:
data Have;
length id $7 race $1 age_group $10 event_dt 8 sex $1 visit_type $10 year 8 _event_dt $9;
infile datalines dsd dlm='09'x;
input id race age_group _event_dt sex $ visit_type $ year:4.;
event_dt=input(compress(_event_dt,'-'),date9.) ;
format event_dt date9.;
datalines;
5334933 W 02.1 <20 21-Apr-20 M Outpatient 2020
5790999 B 02.5 35-39 4-Jan-21 M Outpatient 2021
5848162 H 02.6 >=40 30-Aug-22 M Outpatient 2022
5956547 W 02.3 25-29 9-Jul-20 F Outpatient 2020
5917539 B 02.3 25-29 30-Jul-20 M Outpatient 2020
4319890 W 02.3 25-29 9-Jun-20 M Outpatient 2020
1002701 B 02.3 25-29 30-Sep-20 F Outpatient 2020
5825314 Z 02.6 >=40 13-Oct-20 M Outpatient 2020
7047568 O 02.3 25-29 23-Sep-20 M Outpatient 2020
7205606 B 02.2 20-24 3-Feb-22 F Outpatient 2022
;
run;
The WANT dataset produced looked like this:
Obs | id | year | sex | age_group | _TYPE_ | _PAGE_ | _TABLE_ | N |
---|---|---|---|---|---|---|---|---|
1 | 1002701 | 2020 | 1100 | 1 | 1 | 1 | ||
2 | 4319890 | 2020 | 1100 | 1 | 1 | 1 | ||
3 | 5334933 | 2020 | 1100 | 1 | 1 | 1 | ||
4 | 5790999 | 2021 | 1100 | 1 | 1 | 1 | ||
5 | 5825314 | 2020 | 1100 | 1 | 1 | 1 | ||
6 | 5848162 | 2022 | 1100 | 1 | 1 | 1 | ||
7 | 5917539 | 2020 | 1100 | 1 | 1 | 1 | ||
8 | 5956547 | 2020 | 1100 | 1 | 1 | 1 | ||
9 | 7047568 | 2020 | 1100 | 1 | 1 | 1 | ||
10 | 7205606 | 2022 | 1100 | 1 | 1 | 1 | ||
11 | 1002701 | . | 1000 | 1 | 1 | 1 | ||
12 | 4319890 | . | 1000 | 1 | 1 | 1 | ||
13 | 5334933 | . | 1000 | 1 | 1 | 1 | ||
14 | 5790999 | . | 1000 | 1 | 1 | 1 | ||
15 | 5825314 | . | 1000 | 1 | 1 | 1 | ||
16 | 5848162 | . | 1000 | 1 | 1 | 1 | ||
17 | 5917539 | . | 1000 | 1 | 1 | 1 | ||
18 | 5956547 | . | 1000 | 1 | 1 | 1 | ||
19 | 7047568 | . | 1000 | 1 | 1 | 1 | ||
20 | 7205606 | . | 1000 | 1 | 1 | 1 | ||
21 | 2020 | Female | 0110 | 1 | 1 | 2 | ||
22 | 2022 | Female | 0110 | 1 | 1 | 1 | ||
23 | 2020 | Male | 0110 | 1 | 1 | 5 | ||
24 | 2021 | Male | 0110 | 1 | 1 | 1 | ||
25 | 2022 | Male | 0110 | 1 | 1 | 1 | ||
26 | . | Female | 0010 | 1 | 1 | 3 | ||
27 | . | Male | 0010 | 1 | 1 | 7 | ||
28 | 2020 | 02.1 <20 | 0101 | 1 | 1 | 1 | ||
29 | 2022 | 02.2 20-24 | 0101 | 1 | 1 | 1 | ||
30 | 2020 | 02.3 25-29 | 0101 | 1 | 1 | 5 | ||
31 | 2021 | 02.5 35-39 | 0101 | 1 | 1 | 1 | ||
32 | 2020 | 02.6 >=40 | 0101 | 1 | 1 | 1 | ||
33 | 2022 | 02.6 >=40 | 0101 | 1 | 1 | 1 | ||
34 | . | 02.1 <20 | 0001 | 1 | 1 | 1 | ||
35 | . | 02.2 20-24 | 0001 | 1 | 1 | 1 | ||
36 | . | 02.3 25-29 | 0001 | 1 | 1 | 5 | ||
37 | . | 02.5 35-39 | 0001 | 1 | 1 | 1 | ||
38 | . | 02.6 >=40 | 0001 | 1 | 1 | 2 |
I'm not seeing how this output data would be useful in a join with another data set. Can you provide an example of the other dataset you want to join with this data, and a sample of the desired end result?
Can you clarify what count you want and how you want this added back to the original data? For example, do you want the total count of Males for a particular year added to the id that matches that sex and year? If you can show what you are expecting to get based on the data you sent, that would be helpful.
Why are you using a REPORTING procedure? PROC TABULATE is for designed printed reports (what everyone used to mean by the word TABLE). It is not really designed for making DATASETS. And the dataset it does produce are really only useful for making more reports, not for analysis.
If you want the results in a dataset use an ANALYSIS procedure like PROC SUMMARY instead.
If there are too many levels of ID to use it as a CLASS variable try sorting the dataset by ID first and use ID as a BY variable instead.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.