PatientID | Contact_date | Program_code | activation_date | Deactivation_date |
ID1 | 12/1/22 | 1 | 12/1/22 | 12/6/22 |
ID1 | 12/2/22 | 1 | 12/1/22 | 12/6/22 |
ID1 | 12/3/22 | 1 | 12/1/22 | 12/6/22 |
ID1 | 12/6/22 | 1 | 12/1/22 | 12/6/22 |
ID1 | 12/1/22 | 2 | 12/1/22 | 12/6/22 |
ID1 | 12/2/22 | 2 | 12/1/22 | 12/6/22 |
ID1 | 12/3/22 | 2 | 12/1/22 | 12/6/22 |
ID1 | 12/6/22 | 2 | 12/1/22 | 12/6/22 |
ID2 | 12/2/22 | 1 | 1/2/22 | 2/3/22 |
ID2 | 1/2/22 | 1 | 1/2/22 | 2/3/22 |
ID3 | 3/3/21 | 3 | 1/3/21 | 18/6/21 |
ID3 | 16/6/21 | 3 | 1/3/21 | 18/6/21 |
I am interested in counting the average number of contacts per episode (also the number of episodes) based on the patientid and program_code by activation year. An episode is a period between the activation date and the deactivation date.
Date format as DD/MM/YYYY
Can there be a contact date that is not in an episode (not between activation_date and deactivation_date)? For ID2, how can the deactivation_date be before the activation_date?
Please provide data as working SAS data step code (instructions) and not as Excel files or screen captures.
Thank you for looking into this. I have corrected the episode date for patient ID2. Apologies - I don't have SAS on my personal computer and so can't provide the data as a working SAS data step code now.
You didn't answer my first question: "Can there be a contact date that is not in an episode (not between activation_date and deactivation_date)?"
I will wait until you can provide the data in the requested format.
@Sofia2022 wrote:
No it can’t. The contact date need to be between the activation date and deactivation date.
Then isn't ID2 in violation of that rule?
As a side note ... we all get confused by dates of the form 02/03/2022, is that March 2 or February 3? Better is to use unambiguous formatting of dates such as 03FEB2022.
As @PaigeMiller remarked, you should provide example data as a data step, e.g.:
data have;
length PatientID $3 Contact_date Program_code activation_date Deactivation_date 8;
informat Contact_date activation_date Deactivation_date ddmmyy.;
format Contact_date activation_date Deactivation_date ddmmyy.;
input PatientID Contact_date Program_code activation_date Deactivation_date;
cards;
ID1 12/1/22 1 12/1/22 12/6/22
ID1 12/2/22 1 12/1/22 12/6/22
ID1 12/3/22 1 12/1/22 12/6/22
ID1 12/6/22 1 12/1/22 12/6/22
ID1 12/1/22 2 12/1/22 12/6/22
ID1 12/2/22 2 12/1/22 12/6/22
ID1 12/3/22 2 12/1/22 12/6/22
ID1 12/6/22 2 12/1/22 12/6/22
ID2 12/2/22 1 1/2/22 2/3/22
ID2 1/2/22 1 1/2/22 2/3/22
ID3 3/3/21 3 1/3/21 18/6/21
ID3 16/6/21 3 1/3/21 18/6/21
;run;
I would start by getting the counts for each patient and program_code. SQL can be used here:
proc sql;
create table counts as select
PatientID,Program_code,count(*) as count
from have
where Contact_date between activation_date and Deactivation_date
group by PatientID,Program_code;
quit;
Then you can use PROC SUMMARY to get the average number of contacts:
proc summary data=counts;
class PatientID;
var count;
output out=averages mean=avg_count;
run;
This will (as we did not use the NWAY option) create a table like this:
The first row (with missing PatientID and _TYPE_=0) is the grand total, the average for all patients. The _FREQ_ variable shows the number of input rows used to calculate the average. The next rows are for each patient. If you need the averages for the Program_code and not the PatientID, use that variable in the CLASS statement instead, and if you just need the grand total, drop the CLASS statement.
Of course, PROC SUMMARY could also have been used to create the first table as well. And PROC SQL can also calculate averages, but not on different levels at once (the _TYPE_).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.