Hi everyone,
I have some individual-level prescription redemption data and I would like to calculate the point prevalence of drug use, defined as the percentage of the population being treated at monthly intervals one year before and after to the initiation of an intervention (intervention starts at different time points for each individual, but every individual has 1 year-follow-up in both directions).
My current data looks something like this:
Table 1
id
drug_atc
date_of_redemption
grace_period
days_supply
last_day_of_supply
intervention_date
cov_by_last
intervention
1
AA10A
23/12/2020
30
30
21/02/2021
01/02/2021
-
0
1
AA10A
31/01/2021
30
30
01/04/2021
01/02/2021
1
0
1
AA10A
04/02/2021
30
30
05/04/2021
01/02/2021
1
1
1
AA10A
07/03/2021
30
30
06/05/2021
01/02/2021
1
1
1
AA10A
15/05/2021
30
30
14/07/2021
01/02/2021
0
1
2
AA10A
02/04/2021
30
60
01/07/2021
01/06/2021
-
0
2
AA10A
05/08/2021
30
60
03/11/2021
01/06/2021
0
1
2
AA10A
06/09/2021
30
30
05/11/2021
01/06/2021
1
1
Last_day_of_supply is the days_supply + grace_period.
Cov_by_last denotes if the prescription was filled before the last_day_of_supply of the previous prescription (yes=1, no=0).
Intervention denotes if the prescription was filled before (0) or after (1) the intervention.
I also have a shell table with each id, the date of intervention and date of the monthly intervals since intervention:
Table 2
ID
drug_atc
month_since_int
intervention_date
intervention
date_since_inv
1
AA10A
-12
01/02/2021
0
01/02/2020
1
AA10A
-11
01/02/2021
0
01/03/2020
1
AA10A
-10
01/02/2021
0
01/04/2020
1
AA10A
-9
01/02/2021
0
01/05/2020
1
AA10A
-8
01/02/2021
0
01/06/2020
1
AA10A
-7
01/02/2021
0
01/07/2020
1
AA10A
-6
01/02/2021
0
01/08/2020
1
AA10A
-5
01/02/2021
0
01/09/2020
1
AA10A
-4
01/02/2021
0
01/10/2020
1
AA10A
-3
01/02/2021
0
01/11/2020
1
AA10A
-2
01/02/2021
0
01/12/2020
1
AA10A
-1
01/02/2021
0
01/01/2021
1
AA10A
0
01/02/2021
1
01/02/2021
1
AA10A
1
01/02/2021
1
01/03/2021
1
AA10A
2
01/02/2021
1
01/04/2021
1
AA10A
3
01/02/2021
1
01/05/2021
1
AA10A
4
01/02/2021
1
01/06/2021
1
AA10A
5
01/02/2021
1
01/07/2021
1
AA10A
6
01/02/2021
1
01/08/2021
1
AA10A
7
01/02/2021
1
01/09/2021
1
AA10A
8
01/02/2021
1
01/10/2021
1
AA10A
9
01/02/2021
1
01/11/2021
1
AA10A
10
01/02/2021
1
01/12/2021
1
AA10A
11
01/02/2021
1
01/01/2022
1
AA10A
12
01/02/2021
1
01/02/2022
2
AA10A
-12
01/06/2021
0
01/06/2020
2
AA10A
-11
01/06/2021
0
01/07/2020
2
AA10A
-10
01/06/2021
0
01/08/2020
I would like to create a table like this,
table 3
ID
drug_atc
month_since_int
in_treatment
intervention
1
AA10A
-12
1
0
1
AA10A
-11
1
0
1
AA10A
-10
1
0
1
AA10A
-9
1
0
1
AA10A
-8
1
0
1
AA10A
-7
1
0
1
AA10A
-6
1
0
1
AA10A
-5
1
0
1
AA10A
-4
1
0
1
AA10A
-3
1
0
1
AA10A
-2
1
0
1
AA10A
-1
1
0
1
AA10A
0
1
1
1
AA10A
1
1
1
1
AA10A
2
1
1
1
AA10A
3
0
1
1
AA10A
4
1
1
1
AA10A
5
0
1
1
AA10A
6
1
1
1
AA10A
7
0
1
1
AA10A
8
1
1
1
AA10A
9
0
1
1
AA10A
10
1
1
1
AA10A
11
0
1
1
AA10A
12
0
1
2
AA10A
-12
1
0
2
AA10A
-11
1
0
2
AA10A
-10
0
0
where in_treatment denotes whether the individual was covered by a days_supply of the drug on the day of the start of the month_since_int.
Eventually, the final table should look something like this, which contains the percentage of individuals in treatment at each month in relation to the intervention.
month_since_int
drug_atc
in_treat_percent
-12
AA10A
70.0%
-11
AA10A
60.0%
-10
AA10A
80.0%
-9
AA10A
70.0%
-8
AA10A
60.0%
-7
AA10A
60.0%
-6
AA10A
70.0%
-5
AA10A
70.0%
-4
AA10A
60.0%
-3
AA10A
80.0%
-2
AA10A
60.0%
-1
AA10A
70.0%
0
AA10A
60.0%
1
AA10A
60.0%
2
AA10A
50.0%
3
AA10A
40.0%
4
AA10A
50.0%
5
AA10A
30.0%
6
AA10A
30.0%
7
AA10A
20.0%
8
AA10A
20.0%
9
AA10A
30.0%
10
AA10A
20.0%
The key step I am struggling with is creating table 3, but I happy to receive input to any of the steps, it does not have to be the total solution.
Thanks
... View more