Solved: Trying to calculate incidence and having an issue with the structure o...

sas21 · Posted 02-08-2024 11:44 AM

I have two datasets. One is a roster (one record per person) which I then used to pull their medical records. I applied a case definition to their medical records and then kept their incident medical encounter. So, the medical data has one record per person as well. Everyone in the medical data is in the roster but not everyone in the roster is in the medical data.

Examples of the data imports for the datasets below:

data medical_data;

input team $ patientSSN $ Validation cy;

datalines;

1234567891 1 2018

12345678912 1 2015

1234567893 1 2017

1234567894 1 2019

1234567895 1 2018

1234567896 1 2015

1234567897 1 2016

1234567898 1 2019

1234567899 1 2017

;

run;

data roster;

input team $ SSN $country;

datalines;

1234567891 qa

12345678912 af

1234567893 au

1234567894 qa

1234567895 af

1234567896 af

1234567897 bh

1234567898 qa

1234567899 qa

1234567818 qa

1234567819 bh

;

run;

An example of the code that I created to try to structure the data how I want:

proc sql;

create table roster_and_medical as

select *

from roster as a

left join medical_data as b

on a.ssn=b.patientssn ;

quit;

data want;

set roster_and_medical ;

do i=2015 to 2020;

numerator=0;

denominator=0;

if (cy eq i) then

do;

numerator=1;

denominator=1;

end;

else

do;

if (i lt cy) or (cy= .) then

do;

denominator=1;

end;

else;

end;

output;

end;

if cy=. then

do;

cy=i;

end;

run;

It is not structuring the data how I want it to look but it's close. I want to count the incident encounter in the numerator and then count them in the denominator for years 2015-2020 up until their incident encounter or count them in the denominator for the entire timeframe if they don't have an incident encounter. I need the cases to drop out of the denominator once they are no longer considered "at risk" because they have already developed the condition. If someone could help with what I currently have going or if they have another idea that would work better or be more efficient, I would greatly appreciate it. Thank you in advance.

An example how I want the data to look:

patientssn	ssn	validation	cy	i	country	numerator	denominator
1234567891	1234567891	.	.	2015	qa	0	1
1234567891	1234567891	.	.	2016	qa	0	1
1234567891	1234567891	.	.	2017	qa	0	1
1234567891	1234567891	1	2018	2018	qa	1	1
1234567891	1234567891	.	.	2019	qa	0	0
1234567891	1234567891	.	.	2020	qa	0	0
	1234567818	.	.	2015	qa	0	1
	1234567818	.	.	2016	qa	0	1
	1234567818	.	.	2017	qa	0	1
	1234567818	.	.	2018	qa	0	1
	1234567818	.	.	2019	qa	0	1
	1234567818	.	.	2020	qa	0	1

s_lassen · Posted 02-09-2024 01:14 AM

Your initial data steps did not work as intended, so I rewrote them - they are here, in case somebody else want to help with this:

data medical_data;
  length patientSSN $20;
    input patientSSN Validation cy;
    datalines;
1234567891 1 2018
12345678912 1 2015
1234567893 1 2017
1234567894 1  2019
1234567895 1 2018
1234567896 1 2015
1234567897 1 2016
1234567898 1 2019
1234567899 1 2017
;
run;
 
data roster;
  length SSN $20 country $2;
    input  SSN country 
    ;
    datalines;
1234567891 qa
12345678912 af
1234567893 au
1234567894 qa
1234567895 af
1234567896 af
1234567897 bh
1234567898 qa
1234567899 qa
1234567818 qa
1234567819 bh
;
run;

I rewrote your last data step to this:


data want;
  set roster_and_medical ; 
  numerator=0;
  denominator=missing(cy) or cy ge 2015;
  do i=2015 to 2020;
    if (cy eq i) then
      numerator=1;
    else if numerator=1 then do;
      numerator=0;
      denominator=0;
      end;
    output;
    end;
run;

which seems to give the result you want, as simple as I could make it.

View solution in original post

s_lassen · Posted 02-09-2024 01:14 AM

Your initial data steps did not work as intended, so I rewrote them - they are here, in case somebody else want to help with this:

data medical_data;
  length patientSSN $20;
    input patientSSN Validation cy;
    datalines;
1234567891 1 2018
12345678912 1 2015
1234567893 1 2017
1234567894 1  2019
1234567895 1 2018
1234567896 1 2015
1234567897 1 2016
1234567898 1 2019
1234567899 1 2017
;
run;
 
data roster;
  length SSN $20 country $2;
    input  SSN country 
    ;
    datalines;
1234567891 qa
12345678912 af
1234567893 au
1234567894 qa
1234567895 af
1234567896 af
1234567897 bh
1234567898 qa
1234567899 qa
1234567818 qa
1234567819 bh
;
run;

I rewrote your last data step to this:


data want;
  set roster_and_medical ; 
  numerator=0;
  denominator=missing(cy) or cy ge 2015;
  do i=2015 to 2020;
    if (cy eq i) then
      numerator=1;
    else if numerator=1 then do;
      numerator=0;
      denominator=0;
      end;
    output;
    end;
run;

which seems to give the result you want, as simple as I could make it.

Trying to calculate incidence and having an issue with the structure of the data

Re: Trying to calculate incidence and having an issue with the structure of the data

Re: Trying to calculate incidence and having an issue with the structure of the data

Trying to calculate incidence and having an issue with the structure of the data

Re: Trying to calculate incidence and having an issue with the structure of the data

Re: Trying to calculate incidence and having an issue with the structure of the data

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!