BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tamino
Obsidian | Level 7

Hello,

 

I have a dataset with claims of medication, each line / observation is one claim. I have the patient_ID, the date of the claim, and the medication. Now I created a variable, which defines the event (claim of a specific medication x, newuse=1). Here is a sample dataset:

 

patient_ID date_claim medication newuse
1 17.06.2016 Medication 0
1 30.08.2019 Medication 0
1 30.08.2019 Medication 0
1 23.09.2019 Medication 0
1 23.09.2019 Medication 0
1 23.09.2019 Medication 0
1 23.09.2019 Medication 0
1 24.10.2019 Medication X 1
1 22.11.2019 Medication 0
1 03.03.2020 Medication 0
1 15.07.2020 Medication 0
1 02.03.2021 Medication 0
1 15.11.2021 Medication 0
2 27.01.2017 Medication 0
2 24.08.2018 Medication 0
2 14.09.2018 Medication 0
2 24.09.2018 Medication 0
2 19.10.2018 Medication 0
2 16.11.2018 Medication 0
2 01.02.2019 Medication 0
2 01.02.2019 Medication X 1
2 08.04.2019 Medication 0
2 06.05.2019 Medication 0
2 04.11.2019 Medication 0
2 04.11.2019 Medication 0
2 27.03.2020 Medication 0
2 21.06.2021 Medication 0
2 14.12.2021 Medication 0
3 25.06.2018 Medication 0
3 27.11.2018 Medication 0
3 16.04.2019 Medication 0
3 15.04.2020 Medication 0
3 15.04.2020 Medication 0
3 15.04.2020 Medication 0
3 24.04.2020 Medication 0
3 12.10.2020 Medication 0
3 12.10.2020 Medication 0
3 20.11.2020 Medication 0
3 20.11.2020 Medication X 1

 

Now I would like to create a new variable, which counts the number of days before and after this claim, i.e. counting number of days backwards and forward from this event (newuse=1) for each patient_ID. How can I do this / which command do I have to apply?

 

I would be very happy for your help, thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
A_Kh
Lapis Lazuli | Level 10

Possible.
I would create a temporary column with Medication Date value per PATIENT_ID and subtract DATE_CLAIM from Medication date.  I accomplish it in 2 sql steps, but I believe there are ways to calculate it in less steps and more efficient ways. 
Days after claims get a negative (-) number, if you want only positive numbers, then could handle it using a simple arithmetic operations.. 

Eg: A little modification to my previous code; 

proc sql;
	create table want as
		select*, min(date_claim) as start format=date11., max(date_claim) as end format=date11.,
			case newuse when 1 then date_claim-min(date_claim) else . end as Days_Before_Claim,
			case newuse when 1 then max(date_claim)-date_claim else . end as Days_After_Claim,
			case newuse when 1 then date_claim else . end as Medication_date
			from have
		group by patient_id
	order by patient_id, date_claim, newuse;
quit; 

proc sql;
	create table want1 as
		select*, max(medication_date)-date_claim as days_count
			from want
		group by patient_id
	order by patient_id, date_claim;
quit; 

View solution in original post

11 REPLIES 11
PaigeMiller
Diamond | Level 26

@Tamino wrote:

 

I have a dataset with claims of medication, each line / observation is one claim. I have the patient_ID, the date of the claim, and the medication. Now I created a variable, which defines the event (claim of a specific medication x, newuse=1). Here is a sample dataset:

 

Now I would like to create a new variable, which counts the number of days before and after this claim, i.e. counting number of days backwards and forward from this event (newuse=1) for each patient_ID. How can I do this / which command do I have to apply?


Counting number of days backwards until what? Counting number of days forward until what? How do we know when to stop counting forwards or backwards?

--
Paige Miller
Tamino
Obsidian | Level 7

Counting days between date_claim where newuse=1 and each other claim (date_claim) for each patient (patient_id). So from date_claim of newuse=1 backward for each claim until first claim of a patient_id and forward for each claim until last claim of a patient_id

LinusH
Tourmaline | Level 20

So there can only be one newuse=1 per patient_ID?

I can see to strategies:

- Join the table with itself on patiend_ID (the "left" tabel in the you is sub-setted with newuse=1). Then calculate the difference between the claim date and the newuse=1 date.

- Sort the table so that newuse=1 comes first for each Patiend_ID. The in a data step retain in a new variable the newuse=1 data, and then calculate the date difference for each observation.

Data never sleeps
Tamino
Obsidian | Level 7

There are also patient_id with more than one newuse=1, that's the problem when sorting the table by newuse so that it is on top. 

 

I didn't get quite well your first suggested strategy.

s_lassen
Meteorite | Level 14

You will make things a lot easier for us and yourself if you present your input data as a data step, e.g.:

data have;
  infile cards delimiter='|';
  length patient_ID date_claim 8 medication $20 newuse 8;
  informat date_claim ddmmyy10.;
  format date_claim ddmmyy10.;
  input patient_ID date_claim medication newuse;
cards;
1|17.06.2016|Medication|0
1|30.08.2019|Medication|0
1|30.08.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|24.10.2019|Medication X|1
1|22.11.2019|Medication|0
1|03.03.2020|Medication|0
1|15.07.2020|Medication|0
1|02.03.2021|Medication|0
1|15.11.2021|Medication|0
2|27.01.2017|Medication|0
2|24.08.2018|Medication|0
2|14.09.2018|Medication|0
2|24.09.2018|Medication|0
2|19.10.2018|Medication|0
2|16.11.2018|Medication|0
2|01.02.2019|Medication|0
2|01.02.2019|Medication X|1
2|08.04.2019|Medication|0
2|06.05.2019|Medication|0
2|04.11.2019|Medication|0
2|04.11.2019|Medication|0
2|27.03.2020|Medication|0
2|21.06.2021|Medication|0
2|14.12.2021|Medication|0
3|25.06.2018|Medication|0
3|27.11.2018|Medication|0
3|16.04.2019|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|24.04.2020|Medication|0
3|12.10.2020|Medication|0
3|12.10.2020|Medication|0
3|20.11.2020|Medication|0
3|20.11.2020|Medication X|1
;run;

And then present the data you want in the same way.

 

A_Kh
Lapis Lazuli | Level 10

What I understood is to calculate the number of days before and after the claim_date per patient_id. If so, the below code might give you the solution.

data have;
input @1 patient_ID @5 date_claim:ddmmyy10. @17 medication $17-28	@29 newuse;
format date_claim date11.;
cards; 
1	17.06.2016	Medication	0
1	30.08.2019	Medication	0
1	30.08.2019	Medication	0
1	23.09.2019	Medication	0
1	23.09.2019	Medication	0
1	23.09.2019	Medication	0
1	23.09.2019	Medication	0
1	24.10.2019	Medication X1
1	22.11.2019	Medication	0
1	03.03.2020	Medication	0
1	15.07.2020	Medication	0
1	02.03.2021	Medication	0
1	15.11.2021	Medication	0
2	27.01.2017	Medication	0
2	24.08.2018	Medication	0
2	14.09.2018	Medication	0
2	24.09.2018	Medication	0
2	19.10.2018	Medication	0
2	16.11.2018	Medication	0
2	01.02.2019	Medication	0
2	01.02.2019	Medication X1
2	08.04.2019	Medication	0
2	06.05.2019	Medication	0
2	04.11.2019	Medication	0
2	04.11.2019	Medication	0
2	27.03.2020	Medication	0
2	21.06.2021	Medication	0
2	14.12.2021	Medication	0
3	25.06.2018	Medication	0
3	27.11.2018	Medication	0
3	16.04.2019	Medication	0
3	15.04.2020	Medication	0
3	15.04.2020	Medication	0
3	15.04.2020	Medication	0
3	24.04.2020	Medication	0
3	12.10.2020	Medication	0
3	12.10.2020	Medication	0
3	20.11.2020	Medication	0
3	20.11.2020	Medication X1
;
proc print; run; 

proc sql;
	create table want as
		select*, min(date_claim) as start format=date11., max(date_claim) as end format=date11.,
			case newuse when 1 then date_claim-min(date_claim) else . end as Days_Before_Claim,
			case newuse when 1 then max(date_claim)-date_claim else . end as Days_After_Claim
			from have
		group by patient_id
	order by patient_id, date_claim, newuse;
quit; 
		

 

Tamino
Obsidian | Level 7

Thank you.

 

It counts the days from first claim to newuse=1 and days from newuse=1 to the last claim of a patient_id. I would like to know the number of days betweet each single claim and newuse=1. Is this also possible?

A_Kh
Lapis Lazuli | Level 10

Possible.
I would create a temporary column with Medication Date value per PATIENT_ID and subtract DATE_CLAIM from Medication date.  I accomplish it in 2 sql steps, but I believe there are ways to calculate it in less steps and more efficient ways. 
Days after claims get a negative (-) number, if you want only positive numbers, then could handle it using a simple arithmetic operations.. 

Eg: A little modification to my previous code; 

proc sql;
	create table want as
		select*, min(date_claim) as start format=date11., max(date_claim) as end format=date11.,
			case newuse when 1 then date_claim-min(date_claim) else . end as Days_Before_Claim,
			case newuse when 1 then max(date_claim)-date_claim else . end as Days_After_Claim,
			case newuse when 1 then date_claim else . end as Medication_date
			from have
		group by patient_id
	order by patient_id, date_claim, newuse;
quit; 

proc sql;
	create table want1 as
		select*, max(medication_date)-date_claim as days_count
			from want
		group by patient_id
	order by patient_id, date_claim;
quit; 
Tamino
Obsidian | Level 7

Thank you, this works out! Even though only for those patient_id that only have one newuse=1, if more than one, it counts form the last one on.

mkeintz
PROC Star

Assuming the data are sorted by patient_id/date_claim, you can read each ID in two passes.  In the first pass establish and retain the start_date and end_date variables (i.e. the earliest and latest date_claim values).  In the second pass, read only then newuse=1 cases and calculate the number of days prior and after each newuse=1 case:

 

data have;
  infile cards delimiter='|';
  length patient_ID date_claim 8 medication $20 newuse 8;
  informat date_claim ddmmyy10.;
  format date_claim ddmmyy10.;
  input patient_ID date_claim medication newuse;
cards;
1|17.06.2016|Medication|0
1|30.08.2019|Medication|0
1|30.08.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|24.10.2019|Medication X|1
1|22.11.2019|Medication|0
1|03.03.2020|Medication|0
1|15.07.2020|Medication|0
1|02.03.2021|Medication|0
1|15.11.2021|Medication|0
2|27.01.2017|Medication|0
2|24.08.2018|Medication|0
2|14.09.2018|Medication|0
2|24.09.2018|Medication|0
2|19.10.2018|Medication|0
2|16.11.2018|Medication|0
2|01.02.2019|Medication|0
2|01.02.2019|Medication X|1
2|08.04.2019|Medication|0
2|06.05.2019|Medication|0
2|04.11.2019|Medication|0
2|04.11.2019|Medication|0
2|27.03.2020|Medication|0
2|21.06.2021|Medication|0
2|14.12.2021|Medication|0
3|25.06.2018|Medication|0
3|27.11.2018|Medication|0
3|16.04.2019|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|24.04.2020|Medication|0
3|12.10.2020|Medication|0
3|12.10.2020|Medication|0
3|20.11.2020|Medication|0
3|20.11.2020|Medication X|1
run;

data want;
  set have (in=first_pass)
      have (in=second_pass where=(newuse=1)) ;
  by patient_id;
  retain start_date end_date;
  if first.patient_id then start_date=date_claim;
  if first_pass then end_date=date_claim;
  format start_date end_date ddmmyy10.;

  if second_pass;
  days_before= date_claim-start_date;
  days_after = end_date-date_claim;
run;

This works no matter how many NEWUSE cases a patient might have.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
/*
Can you post that new variable you want to create?
If you don't have a big table ,could try the following code,
otherwise,need to resort to Hash Table.
*/
data have;
  infile cards delimiter='|';
  length patient_ID date_claim 8 medication $20 newuse 8;
  informat date_claim ddmmyy10.;
  format date_claim ddmmyy10.;
  input patient_ID date_claim medication newuse;
cards;
1|17.06.2016|Medication|0
1|30.08.2019|Medication|0
1|30.08.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|23.09.2019|Medication|0
1|24.10.2019|Medication X|1
1|22.11.2019|Medication|0
1|03.03.2020|Medication|0
1|15.07.2020|Medication|0
1|02.03.2021|Medication|0
1|15.11.2021|Medication|0
2|27.01.2017|Medication|0
2|24.08.2018|Medication|0
2|14.09.2018|Medication|0
2|24.09.2018|Medication|0
2|19.10.2018|Medication|0
2|16.11.2018|Medication|0
2|01.02.2019|Medication|0
2|01.02.2019|Medication X|1
2|08.04.2019|Medication|0
2|06.05.2019|Medication|0
2|04.11.2019|Medication|0
2|04.11.2019|Medication|0
2|27.03.2020|Medication|0
2|21.06.2021|Medication|0
2|14.12.2021|Medication|0
3|25.06.2018|Medication|0
3|27.11.2018|Medication|0
3|16.04.2019|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|15.04.2020|Medication|0
3|24.04.2020|Medication|0
3|12.10.2020|Medication|0
3|12.10.2020|Medication|0
3|20.11.2020|Medication|0
3|20.11.2020|Medication X|1
;
proc sql;
create table want as
select *,case when newuse=1 then  count(distinct date_claim)-1 else 0 end as want
 from have 
  group by patient_ID
   order by patient_ID,date_claim;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1392 views
  • 2 likes
  • 7 in conversation