Re: Count by multiple variables

Sofia2022 · Posted 11-08-2022 08:45 AM

PatientID	Contact_date	Program_code	activation_date	Deactivation_date
ID1	12/1/22	1	12/1/22	12/6/22
ID1	12/2/22	1	12/1/22	12/6/22
ID1	12/3/22	1	12/1/22	12/6/22
ID1	12/6/22	1	12/1/22	12/6/22
ID1	12/1/22	2	12/1/22	12/6/22
ID1	12/2/22	2	12/1/22	12/6/22
ID1	12/3/22	2	12/1/22	12/6/22
ID1	12/6/22	2	12/1/22	12/6/22
ID2	12/2/22	1	1/2/22	2/3/22
ID2	1/2/22	1	1/2/22	2/3/22
ID3	3/3/21	3	1/3/21	18/6/21
ID3	16/6/21	3	1/3/21	18/6/21

I am interested in counting the average number of contacts per episode (also the number of episodes) based on the patientid and program_code by activation year. An episode is a period between the activation date and the deactivation date. 

Date format as DD/MM/YYYY

PaigeMiller · Posted 11-08-2022 08:58 AM

Can there be a contact date that is not in an episode (not between activation_date and deactivation_date)? For ID2, how can the deactivation_date be before the activation_date?

Please provide data as working SAS data step code (instructions) and not as Excel files or screen captures.

--
Paige Miller

Sofia2022 · Posted 11-08-2022 09:06 AM

Thank you for looking into this. I have corrected the episode date for patient ID2. Apologies - I don't have SAS on my personal computer and so can't provide the data as a working SAS data step code now.

PaigeMiller · Posted 11-08-2022 09:15 AM

You didn't answer my first question: "Can there be a contact date that is not in an episode (not between activation_date and deactivation_date)?"

I will wait until you can provide the data in the requested format.

--
Paige Miller

Sofia2022 · Posted 11-08-2022 09:19 AM

No it can’t. The contact date need to be between the activation date and deactivation date.

PaigeMiller · Posted 11-08-2022 09:24 AM

@Sofia2022 wrote:
No it can’t. The contact date need to be between the activation date and deactivation date.

Then isn't ID2 in violation of that rule?

--
Paige Miller

Sofia2022 · Posted 11-08-2022 09:39 AM

Not now, as I have corrected the deactivation date. The contact dates for ID2 is now equal or between the activation and deactivation date.

PaigeMiller · Posted 11-08-2022 09:46 AM

As a side note ... we all get confused by dates of the form 02/03/2022, is that March 2 or February 3? Better is to use unambiguous formatting of dates such as 03FEB2022.

--
Paige Miller

Sofia2022 · Posted 11-08-2022 04:59 PM

March 2. Date format as DD/MM/YYYY

s_lassen · Posted 01-11-2023 02:55 AM

As @PaigeMiller remarked, you should provide example data as a data step, e.g.:

data have;
  length PatientID $3 Contact_date Program_code activation_date Deactivation_date 8;
  informat Contact_date activation_date Deactivation_date ddmmyy.;
  format Contact_date activation_date Deactivation_date ddmmyy.;
  input PatientID Contact_date Program_code activation_date Deactivation_date;
cards;
ID1 12/1/22 1 12/1/22 12/6/22
ID1 12/2/22 1 12/1/22 12/6/22
ID1 12/3/22 1 12/1/22 12/6/22
ID1 12/6/22 1 12/1/22 12/6/22
ID1 12/1/22 2 12/1/22 12/6/22
ID1 12/2/22 2 12/1/22 12/6/22
ID1 12/3/22 2 12/1/22 12/6/22
ID1 12/6/22 2 12/1/22 12/6/22
ID2 12/2/22 1 1/2/22 2/3/22
ID2 1/2/22 1 1/2/22 2/3/22
ID3 3/3/21 3 1/3/21 18/6/21
ID3 16/6/21 3 1/3/21 18/6/21
;run;

I would start by getting the counts for each patient and program_code. SQL can be used here:

proc sql;
  create table counts as select
    PatientID,Program_code,count(*) as count
  from have
  where Contact_date between activation_date and Deactivation_date
  group by PatientID,Program_code;
quit;

Then you can use PROC SUMMARY to get the average number of contacts:

proc summary data=counts;
  class PatientID;
  var count;
  output out=averages mean=avg_count;
run;

This will (as we did not use the NWAY option) create a table like this:

The first row (with missing PatientID and _TYPE_=0) is the grand total, the average for all patients. The _FREQ_ variable shows the number of input rows used to calculate the average. The next rows are for each patient. If you need the averages for the Program_code and not the PatientID, use that variable in the CLASS statement instead, and if you just need the grand total, drop the CLASS statement.

Of course, PROC SUMMARY could also have been used to create the first table as well. And PROC SQL can also calculate averages, but not on different levels at once (the _TYPE_).

Registration is open

SAS Training: Just a Click Away