Solved: Re: Counting days before and after an event

Tamino · Posted 10-26-2023 09:37 AM

Hello,

I have a dataset with claims of medication, each line / observation is one claim. I have the patient_ID, the date of the claim, and the medication. Now I created a variable, which defines the event (claim of a specific medication x, newuse=1). Here is a sample dataset:

patient_ID	date_claim	medication	newuse
1	17.06.2016	Medication	0
1	30.08.2019	Medication	0
1	30.08.2019	Medication	0
1	23.09.2019	Medication	0
1	23.09.2019	Medication	0
1	23.09.2019	Medication	0
1	23.09.2019	Medication	0
1	24.10.2019	Medication X	1
1	22.11.2019	Medication	0
1	03.03.2020	Medication	0
1	15.07.2020	Medication	0
1	02.03.2021	Medication	0
1	15.11.2021	Medication	0
2	27.01.2017	Medication	0
2	24.08.2018	Medication	0
2	14.09.2018	Medication	0
2	24.09.2018	Medication	0
2	19.10.2018	Medication	0
2	16.11.2018	Medication	0
2	01.02.2019	Medication	0
2	01.02.2019	Medication X	1
2	08.04.2019	Medication	0
2	06.05.2019	Medication	0
2	04.11.2019	Medication	0
2	04.11.2019	Medication	0
2	27.03.2020	Medication	0
2	21.06.2021	Medication	0
2	14.12.2021	Medication	0
3	25.06.2018	Medication	0
3	27.11.2018	Medication	0
3	16.04.2019	Medication	0
3	15.04.2020	Medication	0
3	15.04.2020	Medication	0
3	15.04.2020	Medication	0
3	24.04.2020	Medication	0
3	12.10.2020	Medication	0
3	12.10.2020	Medication	0
3	20.11.2020	Medication	0
3	20.11.2020	Medication X	1

Now I would like to create a new variable, which counts the number of days before and after this claim, i.e. counting number of days backwards and forward from this event (newuse=1) for each patient_ID. How can I do this / which command do I have to apply?

I would be very happy for your help, thank you in advance.

A_Kh · Posted 10-31-2023 01:39 PM

Possible.
I would create a temporary column with Medication Date value per PATIENT_ID and subtract DATE_CLAIM from Medication date. I accomplish it in 2 sql steps, but I believe there are ways to calculate it in less steps and more efficient ways.
Days after claims get a negative (-) number, if you want only positive numbers, then could handle it using a simple arithmetic operations..

Eg: A little modification to my previous code;

proc sql;
	create table want as
		select*, min(date_claim) as start format=date11., max(date_claim) as end format=date11.,
			case newuse when 1 then date_claim-min(date_claim) else . end as Days_Before_Claim,
			case newuse when 1 then max(date_claim)-date_claim else . end as Days_After_Claim,
			case newuse when 1 then date_claim else . end as Medication_date
			from have
		group by patient_id
	order by patient_id, date_claim, newuse;
quit; 

proc sql;
	create table want1 as
		select*, max(medication_date)-date_claim as days_count
			from want
		group by patient_id
	order by patient_id, date_claim;
quit;

View solution in original post

PaigeMiller · Posted 10-26-2023 10:03 AM

@Tamino wrote:

I have a dataset with claims of medication, each line / observation is one claim. I have the patient_ID, the date of the claim, and the medication. Now I created a variable, which defines the event (claim of a specific medication x, newuse=1). Here is a sample dataset:

Now I would like to create a new variable, which counts the number of days before and after this claim, i.e. counting number of days backwards and forward from this event (newuse=1) for each patient_ID. How can I do this / which command do I have to apply?

Counting number of days backwards until what? Counting number of days forward until what? How do we know when to stop counting forwards or backwards?

--
Paige Miller

Tamino · Posted 10-31-2023 09:40 AM

Counting days between date_claim where newuse=1 and each other claim (date_claim) for each patient (patient_id). So from date_claim of newuse=1 backward for each claim until first claim of a patient_id and forward for each claim until last claim of a patient_id

LinusH · Posted 10-26-2023 10:08 AM

So there can only be one newuse=1 per patient_ID?

I can see to strategies:

- Join the table with itself on patiend_ID (the "left" tabel in the you is sub-setted with newuse=1). Then calculate the difference between the claim date and the newuse=1 date.

- Sort the table so that newuse=1 comes first for each Patiend_ID. The in a data step retain in a new variable the newuse=1 data, and then calculate the date difference for each observation.

Data never sleeps

Tamino · Posted 10-31-2023 09:16 AM

There are also patient_id with more than one newuse=1, that's the problem when sorting the table by newuse so that it is on top.

I didn't get quite well your first suggested strategy.

s_lassen · Posted 10-26-2023 10:25 AM

You will make things a lot easier for us and yourself if you present your input data as a data step, e.g.:

data have;
  infile cards delimiter='|';
  length patient_ID date_claim 8 medication $20 newuse 8;
  informat date_claim ddmmyy10.;
  format date_claim ddmmyy10.;
  input patient_ID date_claim medication newuse;
cards;
1|17.06.2016|Medication|0
1|30.08.2019|Medication|0
1|30.08.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|24.10.2019|Medication X|1
1|22.11.2019|Medication|0
1|03.03.2020|Medication|0
1|15.07.2020|Medication|0
1|02.03.2021|Medication|0
1|15.11.2021|Medication|0
2|27.01.2017|Medication|0
2|24.08.2018|Medication|0
2|14.09.2018|Medication|0
2|24.09.2018|Medication|0
2|19.10.2018|Medication|0
2|16.11.2018|Medication|0
2|01.02.2019|Medication|0
2|01.02.2019|Medication X|1
2|08.04.2019|Medication|0
2|06.05.2019|Medication|0
2|04.11.2019|Medication|0
2|04.11.2019|Medication|0
2|27.03.2020|Medication|0
2|21.06.2021|Medication|0
2|14.12.2021|Medication|0
3|25.06.2018|Medication|0
3|27.11.2018|Medication|0
3|16.04.2019|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|24.04.2020|Medication|0
3|12.10.2020|Medication|0
3|12.10.2020|Medication|0
3|20.11.2020|Medication|0
3|20.11.2020|Medication X|1
;run;

And then present the data you want in the same way.

A_Kh · Posted 10-26-2023 10:42 AM

What I understood is to calculate the number of days before and after the claim_date per patient_id. If so, the below code might give you the solution.

data have;
input @1 patient_ID @5 date_claim:ddmmyy10. @17 medication $17-28	@29 newuse;
format date_claim date11.;
cards; 
1	17.06.2016	Medication	0
1	30.08.2019	Medication	0
1	30.08.2019	Medication	0
1	23.09.2019	Medication	0
1	23.09.2019	Medication	0
1	23.09.2019	Medication	0
1	23.09.2019	Medication	0
1	24.10.2019	Medication X1
1	22.11.2019	Medication	0
1	03.03.2020	Medication	0
1	15.07.2020	Medication	0
1	02.03.2021	Medication	0
1	15.11.2021	Medication	0
2	27.01.2017	Medication	0
2	24.08.2018	Medication	0
2	14.09.2018	Medication	0
2	24.09.2018	Medication	0
2	19.10.2018	Medication	0
2	16.11.2018	Medication	0
2	01.02.2019	Medication	0
2	01.02.2019	Medication X1
2	08.04.2019	Medication	0
2	06.05.2019	Medication	0
2	04.11.2019	Medication	0
2	04.11.2019	Medication	0
2	27.03.2020	Medication	0
2	21.06.2021	Medication	0
2	14.12.2021	Medication	0
3	25.06.2018	Medication	0
3	27.11.2018	Medication	0
3	16.04.2019	Medication	0
3	15.04.2020	Medication	0
3	15.04.2020	Medication	0
3	15.04.2020	Medication	0
3	24.04.2020	Medication	0
3	12.10.2020	Medication	0
3	12.10.2020	Medication	0
3	20.11.2020	Medication	0
3	20.11.2020	Medication X1
;
proc print; run; 

proc sql;
	create table want as
		select*, min(date_claim) as start format=date11., max(date_claim) as end format=date11.,
			case newuse when 1 then date_claim-min(date_claim) else . end as Days_Before_Claim,
			case newuse when 1 then max(date_claim)-date_claim else . end as Days_After_Claim
			from have
		group by patient_id
	order by patient_id, date_claim, newuse;
quit;

Tamino · Posted 10-31-2023 09:12 AM

Thank you.

It counts the days from first claim to newuse=1 and days from newuse=1 to the last claim of a patient_id. I would like to know the number of days betweet each single claim and newuse=1. Is this also possible?

A_Kh · Posted 10-31-2023 01:39 PM

Possible.
I would create a temporary column with Medication Date value per PATIENT_ID and subtract DATE_CLAIM from Medication date. I accomplish it in 2 sql steps, but I believe there are ways to calculate it in less steps and more efficient ways.
Days after claims get a negative (-) number, if you want only positive numbers, then could handle it using a simple arithmetic operations..

Eg: A little modification to my previous code;

proc sql;
	create table want as
		select*, min(date_claim) as start format=date11., max(date_claim) as end format=date11.,
			case newuse when 1 then date_claim-min(date_claim) else . end as Days_Before_Claim,
			case newuse when 1 then max(date_claim)-date_claim else . end as Days_After_Claim,
			case newuse when 1 then date_claim else . end as Medication_date
			from have
		group by patient_id
	order by patient_id, date_claim, newuse;
quit; 

proc sql;
	create table want1 as
		select*, max(medication_date)-date_claim as days_count
			from want
		group by patient_id
	order by patient_id, date_claim;
quit;

Tamino · Posted 11-01-2023 10:02 AM

Thank you, this works out! Even though only for those patient_id that only have one newuse=1, if more than one, it counts form the last one on.

mkeintz · Posted 10-26-2023 11:36 PM

Assuming the data are sorted by patient_id/date_claim, you can read each ID in two passes. In the first pass establish and retain the start_date and end_date variables (i.e. the earliest and latest date_claim values). In the second pass, read only then newuse=1 cases and calculate the number of days prior and after each newuse=1 case:

data have;
  infile cards delimiter='|';
  length patient_ID date_claim 8 medication $20 newuse 8;
  informat date_claim ddmmyy10.;
  format date_claim ddmmyy10.;
  input patient_ID date_claim medication newuse;
cards;
1|17.06.2016|Medication|0
1|30.08.2019|Medication|0
1|30.08.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|24.10.2019|Medication X|1
1|22.11.2019|Medication|0
1|03.03.2020|Medication|0
1|15.07.2020|Medication|0
1|02.03.2021|Medication|0
1|15.11.2021|Medication|0
2|27.01.2017|Medication|0
2|24.08.2018|Medication|0
2|14.09.2018|Medication|0
2|24.09.2018|Medication|0
2|19.10.2018|Medication|0
2|16.11.2018|Medication|0
2|01.02.2019|Medication|0
2|01.02.2019|Medication X|1
2|08.04.2019|Medication|0
2|06.05.2019|Medication|0
2|04.11.2019|Medication|0
2|04.11.2019|Medication|0
2|27.03.2020|Medication|0
2|21.06.2021|Medication|0
2|14.12.2021|Medication|0
3|25.06.2018|Medication|0
3|27.11.2018|Medication|0
3|16.04.2019|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|24.04.2020|Medication|0
3|12.10.2020|Medication|0
3|12.10.2020|Medication|0
3|20.11.2020|Medication|0
3|20.11.2020|Medication X|1
run;

data want;
  set have (in=first_pass)
      have (in=second_pass where=(newuse=1)) ;
  by patient_id;
  retain start_date end_date;
  if first.patient_id then start_date=date_claim;
  if first_pass then end_date=date_claim;
  format start_date end_date ddmmyy10.;

  if second_pass;
  days_before= date_claim-start_date;
  days_after = end_date-date_claim;
run;

This works no matter how many NEWUSE cases a patient might have.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 10-27-2023 07:34 AM

/*
Can you post that new variable you want to create?
If you don't have a big table ,could try the following code,
otherwise,need to resort to Hash Table.
*/
data have;
  infile cards delimiter='|';
  length patient_ID date_claim 8 medication $20 newuse 8;
  informat date_claim ddmmyy10.;
  format date_claim ddmmyy10.;
  input patient_ID date_claim medication newuse;
cards;
1|17.06.2016|Medication|0
1|30.08.2019|Medication|0
1|30.08.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|24.10.2019|Medication X|1
1|22.11.2019|Medication|0
1|03.03.2020|Medication|0
1|15.07.2020|Medication|0
1|02.03.2021|Medication|0
1|15.11.2021|Medication|0
2|27.01.2017|Medication|0
2|24.08.2018|Medication|0
2|14.09.2018|Medication|0
2|24.09.2018|Medication|0
2|19.10.2018|Medication|0
2|16.11.2018|Medication|0
2|01.02.2019|Medication|0
2|01.02.2019|Medication X|1
2|08.04.2019|Medication|0
2|06.05.2019|Medication|0
2|04.11.2019|Medication|0
2|04.11.2019|Medication|0
2|27.03.2020|Medication|0
2|21.06.2021|Medication|0
2|14.12.2021|Medication|0
3|25.06.2018|Medication|0
3|27.11.2018|Medication|0
3|16.04.2019|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|24.04.2020|Medication|0
3|12.10.2020|Medication|0
3|12.10.2020|Medication|0
3|20.11.2020|Medication|0
3|20.11.2020|Medication X|1
;
proc sql;
create table want as
select *,case when newuse=1 then  count(distinct date_claim)-1 else 0 end as want
 from have 
  group by patient_ID
   order by patient_ID,date_claim;
quit;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away