08-12-2013 04:28 AM
I always thank for your help doing my
I’m having trouble with my SAS coding
today, any kind of your generous help or small advice would be grateful for me.
Briefly I have dataset as follows.
Cus_ID is customers’ individual ID and
Spd_w#s are weekly spending amount of each customer.
And Death is status of customer after 3
weeks. (if death = 1 then it means customer is considered to be defected)
What I want to do is to build a logistics
regression model for this, so IVs are Spd_w1, 2, 3 and DV is Death. But, as
there should be difference among weekly data in terms of importance because Week
1 shows the figure 2 weeks ago and Week 2 represents the figure 1 week ago, so
I want to add ‘Carryover effect’ on this.
So, I made new variable ‘SPD’ as follows.
SPD= (spd_w1)*(X**2) + (spd_w2)*(x) + (spd_w3)
After that, I’ll use proc logistics to find
whether this variable is significant or not.
But, What really matters is to find the
optimal value for ‘decay parameter’
I build a code like this and want to
optimize the value for ‘decay parameter’ by using macro.
data customer; set original;
Spd=Spd_w1*(decay**2) + spd_w2*(decay) + spd_w3;
proc logistic descending customer;
model Death=spd; run;
As far as I know, the minimum -2LL figure
would be the most efficient way to make it optimize. But, I cannot check every
candidate figures manually one by one. It take too long time to check it.
So, How should I optimize this ‘decay
parameter’ figure by using macro? (Or not using macro)
Any advice would be great help for me.
08-12-2013 01:44 PM
Hello, what seems to be important (given the sample data) is the change in spd values. Instead of fitting a nonlinear function, I suggest you try something simple first, such as:
spdChange12 = spd_w2 - spd_w1;
spdChange23 = spd_w3 - spd_w2;
proc logistic data=spdChange;
model death = spdChange12 spdChange23;
08-14-2013 09:32 AM
The ODS OUTPUT statement allows you to write specific statistics from PROC LOGISTIC to a SAS data set, including the model fit statistics including the -2*log_likelihood. You can concatenate these values for different values of the decay factor before selecting the decay factor with the smallest value of -2*log_likelihood. Another analogous approach you might consider is to restructure your data from short and wide to long and narrow by transposing your SPD values and indexing them by week. Since SPD_W3 follows SPD_W2 which follows SPD_W1, for the application I'm considering, you would have to index the most recent value of SPD [=SPD_W3] as week 1 and the earliest value of SPD [=SPD_W1] as week 3. Then you could try PROC GLIMMIX for a repeated-measures logistic regression using week and subject ID in its RANDOM statement with a RESIDUAL option and a first-order autoregressive variance-covariance structure, AR(1). The value of the autoregressive parameter, rho, would be equivalent to your decay factor.
08-14-2013 01:58 PM
I'm almost the same, except that I would delete the RESIDUAL option, and fit the repeated factor as a G-side effect, choosing to view the mean proportions as conditional rather than marginal, especially as there are no other fixed effects to be considered. I will allow that the marginal (RESIDUAL option) will probably have fewer convergence problems.
Basically, I just want to stay away from pseudo-likelihoods these days. Quasi-likelihoods on the other hand are all right by me.