Hello,
I have a dataset with claims of medication, each line / observation is one claim. I have the patient_ID, the date of the claim, and the medication. Now I created a variable, which defines the event (claim of a specific medication x, newuse=1). Here is a sample dataset:
patient_ID | date_claim | medication | newuse |
1 | 17.06.2016 | Medication | 0 |
1 | 30.08.2019 | Medication | 0 |
1 | 30.08.2019 | Medication | 0 |
1 | 23.09.2019 | Medication | 0 |
1 | 23.09.2019 | Medication | 0 |
1 | 23.09.2019 | Medication | 0 |
1 | 23.09.2019 | Medication | 0 |
1 | 24.10.2019 | Medication X | 1 |
1 | 22.11.2019 | Medication | 0 |
1 | 03.03.2020 | Medication | 0 |
1 | 15.07.2020 | Medication | 0 |
1 | 02.03.2021 | Medication | 0 |
1 | 15.11.2021 | Medication | 0 |
2 | 27.01.2017 | Medication | 0 |
2 | 24.08.2018 | Medication | 0 |
2 | 14.09.2018 | Medication | 0 |
2 | 24.09.2018 | Medication | 0 |
2 | 19.10.2018 | Medication | 0 |
2 | 16.11.2018 | Medication | 0 |
2 | 01.02.2019 | Medication | 0 |
2 | 01.02.2019 | Medication X | 1 |
2 | 08.04.2019 | Medication | 0 |
2 | 06.05.2019 | Medication | 0 |
2 | 04.11.2019 | Medication | 0 |
2 | 04.11.2019 | Medication | 0 |
2 | 27.03.2020 | Medication | 0 |
2 | 21.06.2021 | Medication | 0 |
2 | 14.12.2021 | Medication | 0 |
3 | 25.06.2018 | Medication | 0 |
3 | 27.11.2018 | Medication | 0 |
3 | 16.04.2019 | Medication | 0 |
3 | 15.04.2020 | Medication | 0 |
3 | 15.04.2020 | Medication | 0 |
3 | 15.04.2020 | Medication | 0 |
3 | 24.04.2020 | Medication | 0 |
3 | 12.10.2020 | Medication | 0 |
3 | 12.10.2020 | Medication | 0 |
3 | 20.11.2020 | Medication | 0 |
3 | 20.11.2020 | Medication X | 1 |
Now I would like to create a new variable, which counts the number of days before and after this claim, i.e. counting number of days backwards and forward from this event (newuse=1) for each patient_ID. How can I do this / which command do I have to apply?
I would be very happy for your help, thank you in advance.
Possible.
I would create a temporary column with Medication Date value per PATIENT_ID and subtract DATE_CLAIM from Medication date. I accomplish it in 2 sql steps, but I believe there are ways to calculate it in less steps and more efficient ways.
Days after claims get a negative (-) number, if you want only positive numbers, then could handle it using a simple arithmetic operations..
Eg: A little modification to my previous code;
proc sql; create table want as select*, min(date_claim) as start format=date11., max(date_claim) as end format=date11., case newuse when 1 then date_claim-min(date_claim) else . end as Days_Before_Claim, case newuse when 1 then max(date_claim)-date_claim else . end as Days_After_Claim, case newuse when 1 then date_claim else . end as Medication_date from have group by patient_id order by patient_id, date_claim, newuse; quit; proc sql; create table want1 as select*, max(medication_date)-date_claim as days_count from want group by patient_id order by patient_id, date_claim; quit;
@Tamino wrote:
I have a dataset with claims of medication, each line / observation is one claim. I have the patient_ID, the date of the claim, and the medication. Now I created a variable, which defines the event (claim of a specific medication x, newuse=1). Here is a sample dataset:
Now I would like to create a new variable, which counts the number of days before and after this claim, i.e. counting number of days backwards and forward from this event (newuse=1) for each patient_ID. How can I do this / which command do I have to apply?
Counting number of days backwards until what? Counting number of days forward until what? How do we know when to stop counting forwards or backwards?
Counting days between date_claim where newuse=1 and each other claim (date_claim) for each patient (patient_id). So from date_claim of newuse=1 backward for each claim until first claim of a patient_id and forward for each claim until last claim of a patient_id
So there can only be one newuse=1 per patient_ID?
I can see to strategies:
- Join the table with itself on patiend_ID (the "left" tabel in the you is sub-setted with newuse=1). Then calculate the difference between the claim date and the newuse=1 date.
- Sort the table so that newuse=1 comes first for each Patiend_ID. The in a data step retain in a new variable the newuse=1 data, and then calculate the date difference for each observation.
There are also patient_id with more than one newuse=1, that's the problem when sorting the table by newuse so that it is on top.
I didn't get quite well your first suggested strategy.
You will make things a lot easier for us and yourself if you present your input data as a data step, e.g.:
data have;
infile cards delimiter='|';
length patient_ID date_claim 8 medication $20 newuse 8;
informat date_claim ddmmyy10.;
format date_claim ddmmyy10.;
input patient_ID date_claim medication newuse;
cards;
1|17.06.2016|Medication|0
1|30.08.2019|Medication|0
1|30.08.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|24.10.2019|Medication X|1
1|22.11.2019|Medication|0
1|03.03.2020|Medication|0
1|15.07.2020|Medication|0
1|02.03.2021|Medication|0
1|15.11.2021|Medication|0
2|27.01.2017|Medication|0
2|24.08.2018|Medication|0
2|14.09.2018|Medication|0
2|24.09.2018|Medication|0
2|19.10.2018|Medication|0
2|16.11.2018|Medication|0
2|01.02.2019|Medication|0
2|01.02.2019|Medication X|1
2|08.04.2019|Medication|0
2|06.05.2019|Medication|0
2|04.11.2019|Medication|0
2|04.11.2019|Medication|0
2|27.03.2020|Medication|0
2|21.06.2021|Medication|0
2|14.12.2021|Medication|0
3|25.06.2018|Medication|0
3|27.11.2018|Medication|0
3|16.04.2019|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|24.04.2020|Medication|0
3|12.10.2020|Medication|0
3|12.10.2020|Medication|0
3|20.11.2020|Medication|0
3|20.11.2020|Medication X|1
;run;
And then present the data you want in the same way.
What I understood is to calculate the number of days before and after the claim_date per patient_id. If so, the below code might give you the solution.
data have;
input @1 patient_ID @5 date_claim:ddmmyy10. @17 medication $17-28 @29 newuse;
format date_claim date11.;
cards;
1 17.06.2016 Medication 0
1 30.08.2019 Medication 0
1 30.08.2019 Medication 0
1 23.09.2019 Medication 0
1 23.09.2019 Medication 0
1 23.09.2019 Medication 0
1 23.09.2019 Medication 0
1 24.10.2019 Medication X1
1 22.11.2019 Medication 0
1 03.03.2020 Medication 0
1 15.07.2020 Medication 0
1 02.03.2021 Medication 0
1 15.11.2021 Medication 0
2 27.01.2017 Medication 0
2 24.08.2018 Medication 0
2 14.09.2018 Medication 0
2 24.09.2018 Medication 0
2 19.10.2018 Medication 0
2 16.11.2018 Medication 0
2 01.02.2019 Medication 0
2 01.02.2019 Medication X1
2 08.04.2019 Medication 0
2 06.05.2019 Medication 0
2 04.11.2019 Medication 0
2 04.11.2019 Medication 0
2 27.03.2020 Medication 0
2 21.06.2021 Medication 0
2 14.12.2021 Medication 0
3 25.06.2018 Medication 0
3 27.11.2018 Medication 0
3 16.04.2019 Medication 0
3 15.04.2020 Medication 0
3 15.04.2020 Medication 0
3 15.04.2020 Medication 0
3 24.04.2020 Medication 0
3 12.10.2020 Medication 0
3 12.10.2020 Medication 0
3 20.11.2020 Medication 0
3 20.11.2020 Medication X1
;
proc print; run;
proc sql;
create table want as
select*, min(date_claim) as start format=date11., max(date_claim) as end format=date11.,
case newuse when 1 then date_claim-min(date_claim) else . end as Days_Before_Claim,
case newuse when 1 then max(date_claim)-date_claim else . end as Days_After_Claim
from have
group by patient_id
order by patient_id, date_claim, newuse;
quit;
Thank you.
It counts the days from first claim to newuse=1 and days from newuse=1 to the last claim of a patient_id. I would like to know the number of days betweet each single claim and newuse=1. Is this also possible?
Possible.
I would create a temporary column with Medication Date value per PATIENT_ID and subtract DATE_CLAIM from Medication date. I accomplish it in 2 sql steps, but I believe there are ways to calculate it in less steps and more efficient ways.
Days after claims get a negative (-) number, if you want only positive numbers, then could handle it using a simple arithmetic operations..
Eg: A little modification to my previous code;
proc sql; create table want as select*, min(date_claim) as start format=date11., max(date_claim) as end format=date11., case newuse when 1 then date_claim-min(date_claim) else . end as Days_Before_Claim, case newuse when 1 then max(date_claim)-date_claim else . end as Days_After_Claim, case newuse when 1 then date_claim else . end as Medication_date from have group by patient_id order by patient_id, date_claim, newuse; quit; proc sql; create table want1 as select*, max(medication_date)-date_claim as days_count from want group by patient_id order by patient_id, date_claim; quit;
Thank you, this works out! Even though only for those patient_id that only have one newuse=1, if more than one, it counts form the last one on.
Assuming the data are sorted by patient_id/date_claim, you can read each ID in two passes. In the first pass establish and retain the start_date and end_date variables (i.e. the earliest and latest date_claim values). In the second pass, read only then newuse=1 cases and calculate the number of days prior and after each newuse=1 case:
data have;
infile cards delimiter='|';
length patient_ID date_claim 8 medication $20 newuse 8;
informat date_claim ddmmyy10.;
format date_claim ddmmyy10.;
input patient_ID date_claim medication newuse;
cards;
1|17.06.2016|Medication|0
1|30.08.2019|Medication|0
1|30.08.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|24.10.2019|Medication X|1
1|22.11.2019|Medication|0
1|03.03.2020|Medication|0
1|15.07.2020|Medication|0
1|02.03.2021|Medication|0
1|15.11.2021|Medication|0
2|27.01.2017|Medication|0
2|24.08.2018|Medication|0
2|14.09.2018|Medication|0
2|24.09.2018|Medication|0
2|19.10.2018|Medication|0
2|16.11.2018|Medication|0
2|01.02.2019|Medication|0
2|01.02.2019|Medication X|1
2|08.04.2019|Medication|0
2|06.05.2019|Medication|0
2|04.11.2019|Medication|0
2|04.11.2019|Medication|0
2|27.03.2020|Medication|0
2|21.06.2021|Medication|0
2|14.12.2021|Medication|0
3|25.06.2018|Medication|0
3|27.11.2018|Medication|0
3|16.04.2019|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|24.04.2020|Medication|0
3|12.10.2020|Medication|0
3|12.10.2020|Medication|0
3|20.11.2020|Medication|0
3|20.11.2020|Medication X|1
run;
data want;
set have (in=first_pass)
have (in=second_pass where=(newuse=1)) ;
by patient_id;
retain start_date end_date;
if first.patient_id then start_date=date_claim;
if first_pass then end_date=date_claim;
format start_date end_date ddmmyy10.;
if second_pass;
days_before= date_claim-start_date;
days_after = end_date-date_claim;
run;
This works no matter how many NEWUSE cases a patient might have.
/*
Can you post that new variable you want to create?
If you don't have a big table ,could try the following code,
otherwise,need to resort to Hash Table.
*/
data have;
infile cards delimiter='|';
length patient_ID date_claim 8 medication $20 newuse 8;
informat date_claim ddmmyy10.;
format date_claim ddmmyy10.;
input patient_ID date_claim medication newuse;
cards;
1|17.06.2016|Medication|0
1|30.08.2019|Medication|0
1|30.08.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|24.10.2019|Medication X|1
1|22.11.2019|Medication|0
1|03.03.2020|Medication|0
1|15.07.2020|Medication|0
1|02.03.2021|Medication|0
1|15.11.2021|Medication|0
2|27.01.2017|Medication|0
2|24.08.2018|Medication|0
2|14.09.2018|Medication|0
2|24.09.2018|Medication|0
2|19.10.2018|Medication|0
2|16.11.2018|Medication|0
2|01.02.2019|Medication|0
2|01.02.2019|Medication X|1
2|08.04.2019|Medication|0
2|06.05.2019|Medication|0
2|04.11.2019|Medication|0
2|04.11.2019|Medication|0
2|27.03.2020|Medication|0
2|21.06.2021|Medication|0
2|14.12.2021|Medication|0
3|25.06.2018|Medication|0
3|27.11.2018|Medication|0
3|16.04.2019|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|24.04.2020|Medication|0
3|12.10.2020|Medication|0
3|12.10.2020|Medication|0
3|20.11.2020|Medication|0
3|20.11.2020|Medication X|1
;
proc sql;
create table want as
select *,case when newuse=1 then count(distinct date_claim)-1 else 0 end as want
from have
group by patient_ID
order by patient_ID,date_claim;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.