BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tamino
Obsidian | Level 7

Hello,

 

I have a dataset with claims of medication, each line / observation is one claim. I have the patient_ID, the date of the claim, and the medication. Now I created a variable, which defines the event (claim of a specific medication x, newuse=1). Here is a sample dataset:

 

patient_ID date_claim medication newuse
1 17.06.2016 Medication 0
1 30.08.2019 Medication 0
1 30.08.2019 Medication 0
1 23.09.2019 Medication 0
1 23.09.2019 Medication 0
1 23.09.2019 Medication 0
1 23.09.2019 Medication 0
1 24.10.2019 Medication X 1
1 22.11.2019 Medication 0
1 03.03.2020 Medication 0
1 15.07.2020 Medication 0
1 02.03.2021 Medication 0
1 15.11.2021 Medication 0
2 27.01.2017 Medication 0
2 24.08.2018 Medication 0
2 14.09.2018 Medication 0
2 24.09.2018 Medication 0
2 19.10.2018 Medication 0
2 16.11.2018 Medication 0
2 01.02.2019 Medication 0
2 01.02.2019 Medication X 1
2 08.04.2019 Medication 0
2 06.05.2019 Medication 0
2 04.11.2019 Medication 0
2 04.11.2019 Medication 0
2 27.03.2020 Medication 0
2 21.06.2021 Medication 0
2 14.12.2021 Medication 0
3 25.06.2018 Medication 0
3 27.11.2018 Medication 0
3 16.04.2019 Medication 0
3 15.04.2020 Medication 0
3 15.04.2020 Medication 0
3 15.04.2020 Medication 0
3 24.04.2020 Medication 0
3 12.10.2020 Medication 0
3 12.10.2020 Medication 0
3 20.11.2020 Medication 0
3 20.11.2020 Medication X 1

 

Now I would like to create a new variable, which counts the number of days before and after this claim, i.e. counting number of days backwards and forward from this event (newuse=1) for each patient_ID. How can I do this / which command do I have to apply?

 

I would be very happy for your help, thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
A_Kh
Lapis Lazuli | Level 10

Possible.
I would create a temporary column with Medication Date value per PATIENT_ID and subtract DATE_CLAIM from Medication date.  I accomplish it in 2 sql steps, but I believe there are ways to calculate it in less steps and more efficient ways. 
Days after claims get a negative (-) number, if you want only positive numbers, then could handle it using a simple arithmetic operations.. 

Eg: A little modification to my previous code; 

proc sql;
	create table want as
		select*, min(date_claim) as start format=date11., max(date_claim) as end format=date11.,
			case newuse when 1 then date_claim-min(date_claim) else . end as Days_Before_Claim,
			case newuse when 1 then max(date_claim)-date_claim else . end as Days_After_Claim,
			case newuse when 1 then date_claim else . end as Medication_date
			from have
		group by patient_id
	order by patient_id, date_claim, newuse;
quit; 

proc sql;
	create table want1 as
		select*, max(medication_date)-date_claim as days_count
			from want
		group by patient_id
	order by patient_id, date_claim;
quit; 

View solution in original post

11 REPLIES 11
PaigeMiller
Diamond | Level 26

@Tamino wrote:

 

I have a dataset with claims of medication, each line / observation is one claim. I have the patient_ID, the date of the claim, and the medication. Now I created a variable, which defines the event (claim of a specific medication x, newuse=1). Here is a sample dataset:

 

Now I would like to create a new variable, which counts the number of days before and after this claim, i.e. counting number of days backwards and forward from this event (newuse=1) for each patient_ID. How can I do this / which command do I have to apply?


Counting number of days backwards until what? Counting number of days forward until what? How do we know when to stop counting forwards or backwards?

--
Paige Miller
Tamino
Obsidian | Level 7

Counting days between date_claim where newuse=1 and each other claim (date_claim) for each patient (patient_id). So from date_claim of newuse=1 backward for each claim until first claim of a patient_id and forward for each claim until last claim of a patient_id

LinusH
Tourmaline | Level 20

So there can only be one newuse=1 per patient_ID?

I can see to strategies:

- Join the table with itself on patiend_ID (the "left" tabel in the you is sub-setted with newuse=1). Then calculate the difference between the claim date and the newuse=1 date.

- Sort the table so that newuse=1 comes first for each Patiend_ID. The in a data step retain in a new variable the newuse=1 data, and then calculate the date difference for each observation.

Data never sleeps
Tamino
Obsidian | Level 7

There are also patient_id with more than one newuse=1, that's the problem when sorting the table by newuse so that it is on top. 

 

I didn't get quite well your first suggested strategy.

s_lassen
Meteorite | Level 14

You will make things a lot easier for us and yourself if you present your input data as a data step, e.g.:

data have;
  infile cards delimiter='|';
  length patient_ID date_claim 8 medication $20 newuse 8;
  informat date_claim ddmmyy10.;
  format date_claim ddmmyy10.;
  input patient_ID date_claim medication newuse;
cards;
1|17.06.2016|Medication|0
1|30.08.2019|Medication|0
1|30.08.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|24.10.2019|Medication X|1
1|22.11.2019|Medication|0
1|03.03.2020|Medication|0
1|15.07.2020|Medication|0
1|02.03.2021|Medication|0
1|15.11.2021|Medication|0
2|27.01.2017|Medication|0
2|24.08.2018|Medication|0
2|14.09.2018|Medication|0
2|24.09.2018|Medication|0
2|19.10.2018|Medication|0
2|16.11.2018|Medication|0
2|01.02.2019|Medication|0
2|01.02.2019|Medication X|1
2|08.04.2019|Medication|0
2|06.05.2019|Medication|0
2|04.11.2019|Medication|0
2|04.11.2019|Medication|0
2|27.03.2020|Medication|0
2|21.06.2021|Medication|0
2|14.12.2021|Medication|0
3|25.06.2018|Medication|0
3|27.11.2018|Medication|0
3|16.04.2019|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|24.04.2020|Medication|0
3|12.10.2020|Medication|0
3|12.10.2020|Medication|0
3|20.11.2020|Medication|0
3|20.11.2020|Medication X|1
;run;

And then present the data you want in the same way.

 

A_Kh
Lapis Lazuli | Level 10

What I understood is to calculate the number of days before and after the claim_date per patient_id. If so, the below code might give you the solution.

data have;
input @1 patient_ID @5 date_claim:ddmmyy10. @17 medication $17-28	@29 newuse;
format date_claim date11.;
cards; 
1	17.06.2016	Medication	0
1	30.08.2019	Medication	0
1	30.08.2019	Medication	0
1	23.09.2019	Medication	0
1	23.09.2019	Medication	0
1	23.09.2019	Medication	0
1	23.09.2019	Medication	0
1	24.10.2019	Medication X1
1	22.11.2019	Medication	0
1	03.03.2020	Medication	0
1	15.07.2020	Medication	0
1	02.03.2021	Medication	0
1	15.11.2021	Medication	0
2	27.01.2017	Medication	0
2	24.08.2018	Medication	0
2	14.09.2018	Medication	0
2	24.09.2018	Medication	0
2	19.10.2018	Medication	0
2	16.11.2018	Medication	0
2	01.02.2019	Medication	0
2	01.02.2019	Medication X1
2	08.04.2019	Medication	0
2	06.05.2019	Medication	0
2	04.11.2019	Medication	0
2	04.11.2019	Medication	0
2	27.03.2020	Medication	0
2	21.06.2021	Medication	0
2	14.12.2021	Medication	0
3	25.06.2018	Medication	0
3	27.11.2018	Medication	0
3	16.04.2019	Medication	0
3	15.04.2020	Medication	0
3	15.04.2020	Medication	0
3	15.04.2020	Medication	0
3	24.04.2020	Medication	0
3	12.10.2020	Medication	0
3	12.10.2020	Medication	0
3	20.11.2020	Medication	0
3	20.11.2020	Medication X1
;
proc print; run; 

proc sql;
	create table want as
		select*, min(date_claim) as start format=date11., max(date_claim) as end format=date11.,
			case newuse when 1 then date_claim-min(date_claim) else . end as Days_Before_Claim,
			case newuse when 1 then max(date_claim)-date_claim else . end as Days_After_Claim
			from have
		group by patient_id
	order by patient_id, date_claim, newuse;
quit; 
		

 

Tamino
Obsidian | Level 7

Thank you.

 

It counts the days from first claim to newuse=1 and days from newuse=1 to the last claim of a patient_id. I would like to know the number of days betweet each single claim and newuse=1. Is this also possible?

A_Kh
Lapis Lazuli | Level 10

Possible.
I would create a temporary column with Medication Date value per PATIENT_ID and subtract DATE_CLAIM from Medication date.  I accomplish it in 2 sql steps, but I believe there are ways to calculate it in less steps and more efficient ways. 
Days after claims get a negative (-) number, if you want only positive numbers, then could handle it using a simple arithmetic operations.. 

Eg: A little modification to my previous code; 

proc sql;
	create table want as
		select*, min(date_claim) as start format=date11., max(date_claim) as end format=date11.,
			case newuse when 1 then date_claim-min(date_claim) else . end as Days_Before_Claim,
			case newuse when 1 then max(date_claim)-date_claim else . end as Days_After_Claim,
			case newuse when 1 then date_claim else . end as Medication_date
			from have
		group by patient_id
	order by patient_id, date_claim, newuse;
quit; 

proc sql;
	create table want1 as
		select*, max(medication_date)-date_claim as days_count
			from want
		group by patient_id
	order by patient_id, date_claim;
quit; 
Tamino
Obsidian | Level 7

Thank you, this works out! Even though only for those patient_id that only have one newuse=1, if more than one, it counts form the last one on.

mkeintz
PROC Star

Assuming the data are sorted by patient_id/date_claim, you can read each ID in two passes.  In the first pass establish and retain the start_date and end_date variables (i.e. the earliest and latest date_claim values).  In the second pass, read only then newuse=1 cases and calculate the number of days prior and after each newuse=1 case:

 

data have;
  infile cards delimiter='|';
  length patient_ID date_claim 8 medication $20 newuse 8;
  informat date_claim ddmmyy10.;
  format date_claim ddmmyy10.;
  input patient_ID date_claim medication newuse;
cards;
1|17.06.2016|Medication|0
1|30.08.2019|Medication|0
1|30.08.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|24.10.2019|Medication X|1
1|22.11.2019|Medication|0
1|03.03.2020|Medication|0
1|15.07.2020|Medication|0
1|02.03.2021|Medication|0
1|15.11.2021|Medication|0
2|27.01.2017|Medication|0
2|24.08.2018|Medication|0
2|14.09.2018|Medication|0
2|24.09.2018|Medication|0
2|19.10.2018|Medication|0
2|16.11.2018|Medication|0
2|01.02.2019|Medication|0
2|01.02.2019|Medication X|1
2|08.04.2019|Medication|0
2|06.05.2019|Medication|0
2|04.11.2019|Medication|0
2|04.11.2019|Medication|0
2|27.03.2020|Medication|0
2|21.06.2021|Medication|0
2|14.12.2021|Medication|0
3|25.06.2018|Medication|0
3|27.11.2018|Medication|0
3|16.04.2019|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|24.04.2020|Medication|0
3|12.10.2020|Medication|0
3|12.10.2020|Medication|0
3|20.11.2020|Medication|0
3|20.11.2020|Medication X|1
run;

data want;
  set have (in=first_pass)
      have (in=second_pass where=(newuse=1)) ;
  by patient_id;
  retain start_date end_date;
  if first.patient_id then start_date=date_claim;
  if first_pass then end_date=date_claim;
  format start_date end_date ddmmyy10.;

  if second_pass;
  days_before= date_claim-start_date;
  days_after = end_date-date_claim;
run;

This works no matter how many NEWUSE cases a patient might have.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
/*
Can you post that new variable you want to create?
If you don't have a big table ,could try the following code,
otherwise,need to resort to Hash Table.
*/
data have;
  infile cards delimiter='|';
  length patient_ID date_claim 8 medication $20 newuse 8;
  informat date_claim ddmmyy10.;
  format date_claim ddmmyy10.;
  input patient_ID date_claim medication newuse;
cards;
1|17.06.2016|Medication|0
1|30.08.2019|Medication|0
1|30.08.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|24.10.2019|Medication X|1
1|22.11.2019|Medication|0
1|03.03.2020|Medication|0
1|15.07.2020|Medication|0
1|02.03.2021|Medication|0
1|15.11.2021|Medication|0
2|27.01.2017|Medication|0
2|24.08.2018|Medication|0
2|14.09.2018|Medication|0
2|24.09.2018|Medication|0
2|19.10.2018|Medication|0
2|16.11.2018|Medication|0
2|01.02.2019|Medication|0
2|01.02.2019|Medication X|1
2|08.04.2019|Medication|0
2|06.05.2019|Medication|0
2|04.11.2019|Medication|0
2|04.11.2019|Medication|0
2|27.03.2020|Medication|0
2|21.06.2021|Medication|0
2|14.12.2021|Medication|0
3|25.06.2018|Medication|0
3|27.11.2018|Medication|0
3|16.04.2019|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|24.04.2020|Medication|0
3|12.10.2020|Medication|0
3|12.10.2020|Medication|0
3|20.11.2020|Medication|0
3|20.11.2020|Medication X|1
;
proc sql;
create table want as
select *,case when newuse=1 then  count(distinct date_claim)-1 else 0 end as want
 from have 
  group by patient_ID
   order by patient_ID,date_claim;
quit;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2464 views
  • 2 likes
  • 7 in conversation