I have two datasets. One is a roster (one record per person) which I then used to pull their medical records. I applied a case definition to their medical records and then kept their incident medical encounter. So, the medical data has one record per person as well. Everyone in the medical data is in the roster but not everyone in the roster is in the medical data.
Examples of the data imports for the datasets below:
data medical_data;
input team $ patientSSN $ Validation cy;
datalines;
1234567891 1 2018
12345678912 1 2015
1234567893 1 2017
1234567894 1 2019
1234567895 1 2018
1234567896 1 2015
1234567897 1 2016
1234567898 1 2019
1234567899 1 2017
;
run;
data roster;
input team $ SSN $country;
datalines;
1234567891 qa
12345678912 af
1234567893 au
1234567894 qa
1234567895 af
1234567896 af
1234567897 bh
1234567898 qa
1234567899 qa
1234567818 qa
1234567819 bh
;
run;
An example of the code that I created to try to structure the data how I want:
It is not structuring the data how I want it to look but it's close. I want to count the incident encounter in the numerator and then count them in the denominator for years 2015-2020 up until their incident encounter or count them in the denominator for the entire timeframe if they don't have an incident encounter. I need the cases to drop out of the denominator once they are no longer considered "at risk" because they have already developed the condition. If someone could help with what I currently have going or if they have another idea that would work better or be more efficient, I would greatly appreciate it. Thank you in advance.
patientssn | ssn | validation | cy | i | country | numerator | denominator |
1234567891 | 1234567891 | . | . | 2015 | qa | 0 | 1 |
1234567891 | 1234567891 | . | . | 2016 | qa | 0 | 1 |
1234567891 | 1234567891 | . | . | 2017 | qa | 0 | 1 |
1234567891 | 1234567891 | 1 | 2018 | 2018 | qa | 1 | 1 |
1234567891 | 1234567891 | . | . | 2019 | qa | 0 | 0 |
1234567891 | 1234567891 | . | . | 2020 | qa | 0 | 0 |
1234567818 | . | . | 2015 | qa | 0 | 1 | |
1234567818 | . | . | 2016 | qa | 0 | 1 | |
1234567818 | . | . | 2017 | qa | 0 | 1 | |
1234567818 | . | . | 2018 | qa | 0 | 1 | |
1234567818 | . | . | 2019 | qa | 0 | 1 | |
1234567818 | . | . | 2020 | qa | 0 | 1 |
Your initial data steps did not work as intended, so I rewrote them - they are here, in case somebody else want to help with this:
data medical_data;
length patientSSN $20;
input patientSSN Validation cy;
datalines;
1234567891 1 2018
12345678912 1 2015
1234567893 1 2017
1234567894 1 2019
1234567895 1 2018
1234567896 1 2015
1234567897 1 2016
1234567898 1 2019
1234567899 1 2017
;
run;
data roster;
length SSN $20 country $2;
input SSN country
;
datalines;
1234567891 qa
12345678912 af
1234567893 au
1234567894 qa
1234567895 af
1234567896 af
1234567897 bh
1234567898 qa
1234567899 qa
1234567818 qa
1234567819 bh
;
run;
I rewrote your last data step to this:
data want;
set roster_and_medical ;
numerator=0;
denominator=missing(cy) or cy ge 2015;
do i=2015 to 2020;
if (cy eq i) then
numerator=1;
else if numerator=1 then do;
numerator=0;
denominator=0;
end;
output;
end;
run;
which seems to give the result you want, as simple as I could make it.
Your initial data steps did not work as intended, so I rewrote them - they are here, in case somebody else want to help with this:
data medical_data;
length patientSSN $20;
input patientSSN Validation cy;
datalines;
1234567891 1 2018
12345678912 1 2015
1234567893 1 2017
1234567894 1 2019
1234567895 1 2018
1234567896 1 2015
1234567897 1 2016
1234567898 1 2019
1234567899 1 2017
;
run;
data roster;
length SSN $20 country $2;
input SSN country
;
datalines;
1234567891 qa
12345678912 af
1234567893 au
1234567894 qa
1234567895 af
1234567896 af
1234567897 bh
1234567898 qa
1234567899 qa
1234567818 qa
1234567819 bh
;
run;
I rewrote your last data step to this:
data want;
set roster_and_medical ;
numerator=0;
denominator=missing(cy) or cy ge 2015;
do i=2015 to 2020;
if (cy eq i) then
numerator=1;
else if numerator=1 then do;
numerator=0;
denominator=0;
end;
output;
end;
run;
which seems to give the result you want, as simple as I could make it.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.