Hi there! I am trying to calculate incidence rates across a 20+ year period. A portion of my dataset looks like this, where 1 is a new diagnosis of the disease in that year. I need to sum all of the incident cases of the disease in each year, and then divide each sum by the n of the total population in each year (derived from a different dataset and then merged with this one), & *100000. I’m struggling with how to do these analyses that is not so tedious where I have to do the calculations singularly year by year. For example, can I do a do loop where the total n is different in each of the years?
ID 1995 1996 ... 2016
1 1 . .
2 . 1 .
:
:
n . . 1
Every data management operation will become simpler in SAS if you change your data structure to
ID Year Diag
1 1995 1
1 1996 0
...
where Diag is 0 or 1.
Your data structure is not clear enough to me:
- what does ID represent?
- what does . (a dot) represent - missing value or any other value differ from 1 ? can a value be 2 or 3 ... ?
Please post a real sample of your data.
Please try the below code to derive the sum per year and if you could provide further details on the calculation with example data then we could improvise the code.
data have;
input ID _1995 _1996;
groupx=1;
cards;
1 1 .
2 . 1
3 1 .
;
proc sql;
create table diction as select name from dictionary.columns where libname='WORK' and memname='HAVE' and name not in ('ID' 'GROUPX');
select cat('sum(',strip(name),') as ',strip(name)) into: vars separated by ',' from dictionary.columns where libname='WORK' and memname='HAVE' and upcase(name) not in ('ID' 'GROUPX');
create table want as select &vars from have group by groupx;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.