Filling In Values After Merging Two Datasets

Kemal_G · Posted 04-17-2020 02:21 AM

Dear All,

I am working on a large confidential healthcare data set. I am trying to make cost estimation based on it. In my dataset, there are cancer patients with survival info and their monthly cost. I will subtract patients' own pre-diagnosis monthly costs from post-diagnosis monthly cost to estimate cancer related cost I have postdiagnosis cost of patients until death (therefore number of months covered after diagnosis is different for each patients) but I have 12 month prediagnosis cost for each patients. I am planning to make subtraction based on the following logic:

Let us think about only one patient:

1. If this patient survives less than or equal to 12 months, I will subtract cost in each prediagnosis month from each postdiagnosis month:

cancer cost(i)=postdiagnosiscost(i)-prediagnosis cost(i), i=1,...,S S=number of months survived.

2. If patient survives more than 12 months, I will subtract cost in each prediagnosis month from each postdiagnosis month according to mod 12:

cancer cost(i)=postdiagnosis cost(i)-prediagnosis cost(j), i=1,..,S j=mod(i), S=number of months survived.

Let the patient live 26 months:

For cancer cost at 15th month I will have:

cancercost(15)=postdiagnosis cost(15)-prediagnosis cost(3),

For cancer cost at 25th month I will have:

cancercost(25)=postdiagnosis cost(15)-prediagnosis cost(1)

I merged these two datasets (prediagnosis and postdiagnosis cost) but I have missing values because I have only 12 months of prediagnosis cost and postdiagnosis cost for more than 12 months. I want to fill in missing values of prediagnosis cost by replicating them according to mod of survival of patients.

Can you help me in writing macro or code of this problem?

Thank you.

Shmuel · Posted 04-17-2020 02:39 AM

What type of database you have - is it an excel worksheet or sas data-set?

Please post sample of your data with variable names - few lines and few months only.

ballardw · Posted 04-17-2020 12:57 PM

This may give a place to get started. I am only using 4 pre/post values so it is easy to see.

data example;
   input Pre_1 - Pre_4 Post_1-Post_4;
   array pre Pre_: ;
   array post Post_:;
   array cost_(4);
   do i = 1 to ( n(of post(*)) );
      cost_[i] = post[i] - pre[i];
   end;
datalines;
 10 20 30 40 20 35 46 70
 10 20 30 40 40 20 . .
 ;
 run;

the N(of post(*)) gets the number of elements not missing in the Post array. Assumption is that these are sequential for post month1 to n with no gaps.

You can use basically any function that returns a non-0 positive integer for a simple array index.

So post[Mod(i,4)] would attempt to use the mod 4 of the loop counter as the index. Problem: mod(4,4) is 0 and would cause problems so you would need to check before actually using it with the array.

I use the [] for array index to make it easier to see the difference between the function calls and index definition boundary. Array references could use either the () or [].

With the different break points you would probably do an initial test of the n(of post(*)) and branch accordingly for your boundaries.

Filling In Values After Merging Two Datasets

Re: Filling In Values After Merging Two Datasets

Re: Filling In Values After Merging Two Datasets

Filling In Values After Merging Two Datasets

Re: Filling In Values After Merging Two Datasets

Re: Filling In Values After Merging Two Datasets

Registration is open