BookmarkSubscribeRSS Feed
Kemal_G
Calcite | Level 5

Dear All,

 

I am working on a large confidential healthcare data set. I am trying to make cost estimation based on it. In my dataset, there are cancer patients with survival info and their monthly cost. I will subtract patients' own pre-diagnosis monthly costs from post-diagnosis monthly cost to estimate cancer related cost I have postdiagnosis cost of patients until death (therefore number of months covered after diagnosis is different for each patients) but I have 12 month prediagnosis cost for each patients. I am planning to make subtraction based on the following logic:

 

Let us think about only one patient:

 

1. If this patient survives less than or equal to 12 months, I will subtract cost in each prediagnosis month from each postdiagnosis month:

cancer cost(i)=postdiagnosiscost(i)-prediagnosis cost(i), i=1,...,S S=number of months survived.

2. If patient survives more than 12 months, I will subtract cost in each prediagnosis month from each postdiagnosis month according to mod 12:

cancer cost(i)=postdiagnosis cost(i)-prediagnosis cost(j), i=1,..,S j=mod(i),  S=number of months survived.

Let the patient live 26 months:

For cancer cost at 15th month I will have:

cancercost(15)=postdiagnosis cost(15)-prediagnosis cost(3), 

For cancer cost at 25th month I will have:

cancercost(25)=postdiagnosis cost(15)-prediagnosis cost(1)

 

I merged these two datasets (prediagnosis and postdiagnosis cost) but I have missing values because I have only 12 months of prediagnosis cost and postdiagnosis cost for more than 12 months. I want to fill in missing values of prediagnosis cost by replicating them according to mod of survival of patients. 

 

Can you help me in writing macro or code of this problem?

Thank you.

2 REPLIES 2
Shmuel
Garnet | Level 18

What type of database you have - is it an excel worksheet or sas data-set?

 

Please post sample of your data with variable names - few lines and few months only.

ballardw
Super User

This may give a place to get started. I am only using 4 pre/post values so it is easy to see.

data example;
   input Pre_1 - Pre_4 Post_1-Post_4;
   array pre Pre_: ;
   array post Post_:;
   array cost_(4);
   do i = 1 to ( n(of post(*)) );
      cost_[i] = post[i] - pre[i];
   end;
datalines;
 10 20 30 40 20 35 46 70
 10 20 30 40 40 20 . .
 ;
 run;

the N(of post(*)) gets the number of elements not missing in the Post array. Assumption is that these are  sequential for post month1 to n with no gaps.

You can use basically any function that returns a non-0 positive integer for a simple array index.

So   post[Mod(i,4)] would attempt to use the mod 4 of the loop counter as the index. Problem: mod(4,4) is 0 and would cause problems so you would need to check before actually using it with the array.

I use the [] for array index to make it easier to see the difference between the function calls and index definition boundary. Array references could use either the () or [].

 

With the different break points you would probably do an initial test of the n(of post(*)) and branch accordingly for your boundaries.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 589 views
  • 0 likes
  • 3 in conversation