Solved: Re: QLIM Tobit Model and Prediction

altijani · Posted 06-29-2017 08:49 AM

Greetings,

I have the following data, which I am running it in a Tobit Model. I have two issue that I can not solve:

1. I need to obtain the predicted values for each observation

2. Replacing one of the categorical variables from 1 to zero and obtain the predicted values for each observation

Here is my data

Y_var	X1	X2
0	2	0
59	7	0
0	19	0
25	3	1
0	8	1

My model is as follows:

proc qlim data=have;

model Y_var = X1 X2;

endogenous Y_var ~ censored(lb=0);

run;

What I need is a dataset that has a column with the predicted value at each of the predictors (at the current X1 and X2 level), and another column that has the prediction when X2 is replaced with 0.

Note: I am using SAS Enterprise Guide, but can use any code to run the program in EG.

Many thanks,

Altijani

solarflare · Posted 07-05-2017 04:32 PM

Hi Altijani,

I think I have it figured out now. The key is that the model parameter generation will only consider observations with all of the required variables, so any rows that have a missing value of Y will not be used to estimate the model parameters. Knowing this, we make a copy of the dataset, set the X2 values to zero and the Y values to missing and then append the copy to the original. Then we run PROC QLIM once and the correct model is applied to all of the data.

After that a few data steps are used to split the data and reformat it into one table.

See the code below. Let me know if you have questions.

data test;
	input Y_var X1 X2;
	datalines;
	0  2  0
	59 7  0
	0  19 0
	25 3  1
	0  8  1
	;
run;
*create the mod data set with modified X2 values and missing Y values;
data test_mod;
	set test;
	Y_var=.;
	X2=0;
run;  
*append the mod data set to the original;
data test_mod;
	set test test_mod;
run;

proc print data=test_mod;
    title 'Input Data Set';
run;

proc qlim data=test_mod;
	model Y_var = X1 X2;
	endogenous Y_var ~censored(lb=0);
	output out=qlim_out predicted;
	title 'Proc qlim 1 results';
run;

proc print data=qlim_out;
    title 'PROC QLIM Predicted Values';
run;

data original;
	set qlim_out(obs=5);
run; 

data modified;
	set qlim_out(firstobs=6);
	rename P_Y_var=P_Y_var_mod X2=X2_mod;
run;

data final;
	merge original modified test;
run;

proc print data=final;
run;

View solution in original post

solarflare · Posted 07-05-2017 10:26 AM

I think this will solve the problems you're encountering.

1. Adding the output statement to proc qlim will generate a dataset with the predicted values.

2. Creating a new variable with modified values for X2 allows rerunning the model and generating a new set of predicted variables

See the sample code below. After running proc qlim twice and generating the output, I renamed the variable in the second output and merged them all into one data set.

data test;
	input Y_var X1 X2;
	datalines;
	0 2 0
	59 7 0
	0 19 0
	25 3 1
	0 8 1
	;
run;

* create new data set with modified X2 variable;
data test_mod;
	set test;
	if _N_ = 4 then X2_mod = 0;
	else X2_mod = X2;
run; 

*run proc qlim for with X1 and X2, write output to test_out1;
proc qlim data=test_mod;
	model Y_var = X1 X2;
	endogenous Y_var ~censored(lb=0);
	output out=test_out1 predicted;
run;

*run proc qlim with X1 and X2_mod, write output to test_out2;
proc qlim data=test_mod;
	model Y_var = X1 X2_mod;
	endogenous Y_var ~censored(lb=0);
	output out=test_out2 predicted;
run;

*rename predicted variable in test_out_2;
data test_out2;
	set test_out2;
	rename P_Y_var=P_Y_var_mod;
run;

*combine data from test_mod, test_out1 and test_out2 into new data set named want;
data want;
	merge test_mod test_out1 test_out2;
run;

*print new data set;
        proc print data=want;
run;

Best regards,

Steven

altijani · Posted 07-05-2017 10:36 AM

Thanks Steven for your post. This really helps. However, I am not sure about this code and what it does:

if _N_ = 4 then X2_mod = 0;

Please notice that the data has millions of records and tens of variables, and I am showing here an example of the dataset and not the entire data.

Please let me know how the code will be different given this information.

Thanks ,

Altijani

solarflare · Posted 07-05-2017 11:04 AM

You're welcome.

When I first read your post it seemed that you wanted to replace certain values of one of the variables and that line of code is setting the value for X2_mod to zero for the fourth observation. It can be replaced by any logic you would like to use to replace certain values of the new X2_mod variable.

Although upon rereading the post it seems maybe you want to change all observations of the variable to zero. In that case you can replace that data step with this one.

* create new data set with modified X2 variable;
data test_mod;
 set test;
 X2_mod = 0;
run;

altijani · Posted 07-05-2017 11:32 AM

Thank you Solarflare.

I ran your code, but I see now that this is not exactly what I need. Your code first runs the regression (tobit) and produces the predicted values. That is Good!

Then it ran another regression (tobit) using a different dataset, with all of the X2 equal 0. That is not good 😞

What I want the program to do is to use the model output to estimate the predicted Y values if the values of X2 were 0 instead of 1 for those observations that have X2 value of 1.

In other words, the second prediction should give me exact same predictions for all the values where X2 was not changed, and only gives a different prediction when the X2 value were changed. I use to run this in STATA in limited lines of codes as follows:

tobit Y_var X1 X2, ll ; /*only one tobit model*/

predict Y_hat1 ; /* to predict the Y values at the model parameters */

replace X2 = 0 if X2 == 1; /* changing all the values for X2 to 0 */

predict Y_hat2 ; /* to predict the Y values using the same model pramaters, but at different X2 values */

list Y_var Y_hat1 Y_hat2 ; /* Similar to proc print in SAS*/

Sorry to use the STATA code, but I used it to illustrate what I need and I hope this makes sense.

Thanks,

Altijani

solarflare · Posted 07-05-2017 04:32 PM

Hi Altijani,

I think I have it figured out now. The key is that the model parameter generation will only consider observations with all of the required variables, so any rows that have a missing value of Y will not be used to estimate the model parameters. Knowing this, we make a copy of the dataset, set the X2 values to zero and the Y values to missing and then append the copy to the original. Then we run PROC QLIM once and the correct model is applied to all of the data.

After that a few data steps are used to split the data and reformat it into one table.

See the code below. Let me know if you have questions.

data test;
	input Y_var X1 X2;
	datalines;
	0  2  0
	59 7  0
	0  19 0
	25 3  1
	0  8  1
	;
run;
*create the mod data set with modified X2 values and missing Y values;
data test_mod;
	set test;
	Y_var=.;
	X2=0;
run;  
*append the mod data set to the original;
data test_mod;
	set test test_mod;
run;

proc print data=test_mod;
    title 'Input Data Set';
run;

proc qlim data=test_mod;
	model Y_var = X1 X2;
	endogenous Y_var ~censored(lb=0);
	output out=qlim_out predicted;
	title 'Proc qlim 1 results';
run;

proc print data=qlim_out;
    title 'PROC QLIM Predicted Values';
run;

data original;
	set qlim_out(obs=5);
run; 

data modified;
	set qlim_out(firstobs=6);
	rename P_Y_var=P_Y_var_mod X2=X2_mod;
run;

data final;
	merge original modified test;
run;

proc print data=final;
run;

altijani · Posted 07-10-2017 04:09 PM

This solved the issue. Many thanks.

Altijani

Registration is open