Dear community,
I would please like to ask for your help concerning the following issue.
As "proc quanrteg" offers the opportunity to estimate conditional cumulative distribution functions (:= CDFs),
instead of plotting a graph, I would please like to know, how to obtain the predictions for arbitrary values on the plot's abscissa, i.e., nothing else but predicted values for the CDF(x),
Example:
1. First, I would like to use "proc quantreg" to fit the model. Say, on a sample of N_1 = 1000 observations, with dependent variable y and independent variables x1 and x2.
2. Second, I would like to use the fitted model from step one, to get estimated for a new sample of N_2 = 100 new observations, for which I observe x_1 and x_2;
assume, I am interested, for each observation, in the estimated value for CDF(7) (i.e., the probability, that y is smaller-or-equal to -1) , CDF(11), and CDF(13).
The beginnings for the example follow.
I would be very glad if you were to provide help for this question of mine, please.
Yours sincerely,
Sinistrum
data pro_fit;
seed = 1;
call streaminit(seed);
do i = 1 to 1000;
x1 = rand ('normal', 10, 2);
x2 = rand ('normal', 5, 1);
y1 = 1 + 2 * x1 + 5 * x2 + rand ('normal', 0, 0.5);
output;
end;
drop seed;
run;
data pro_predict;
seed = 2;
call streaminit(seed);
do i = 1 to 100;
x1 = rand ('normal', 10, 2);
x2 = rand ('normal', 5, 1);
output;
end;
drop seed;
run;
proc quantreg
data = pro_fit
ci = none;
model
y1 = x1 x2;
conddist
plot = (
cdfplot
pdfplot
)
;
run;
I discussed this with a colleague, who suggested that you
Try the following:
data pro_fit;
seed = 1;
call streaminit(seed);
do i = 1 to 1000;
x1 = rand ('normal', 10, 2);
x2 = rand ('normal', 5, 1);
y1 = 1 + 2 * x1 + 5 * x2 + rand ('normal', 0, 0.5);
output;
end;
drop seed;
run;
data pro_predict;
seed = 2;
call streaminit(seed);
do i = 1 to 100;
x1 = rand ('normal', 10, 2);
x2 = rand ('normal', 5, 1);
y1= 50; /* 1. Need a valid response for testdata. */
output;
end;
drop seed;
run;
ods graphics on;
ods output cdfplot=cdftest_pred; /* 3 */
proc quantreg
data = pro_fit
ci = none;
model
y1 = x1 x2/quantlev=fqpr(n=30); /* 2 */
conddist hr testdata(so hr hf)=pro_predict
plot = ( cdfplot );
run;
proc print data= cdftest_pred;
run;
I discussed this with a colleague, who suggested that you
Try the following:
data pro_fit;
seed = 1;
call streaminit(seed);
do i = 1 to 1000;
x1 = rand ('normal', 10, 2);
x2 = rand ('normal', 5, 1);
y1 = 1 + 2 * x1 + 5 * x2 + rand ('normal', 0, 0.5);
output;
end;
drop seed;
run;
data pro_predict;
seed = 2;
call streaminit(seed);
do i = 1 to 100;
x1 = rand ('normal', 10, 2);
x2 = rand ('normal', 5, 1);
y1= 50; /* 1. Need a valid response for testdata. */
output;
end;
drop seed;
run;
ods graphics on;
ods output cdfplot=cdftest_pred; /* 3 */
proc quantreg
data = pro_fit
ci = none;
model
y1 = x1 x2/quantlev=fqpr(n=30); /* 2 */
conddist hr testdata(so hr hf)=pro_predict
plot = ( cdfplot );
run;
proc print data= cdftest_pred;
run;
Thank you very much indeed. Once more, this is so enormous - breathtaking, the help I received thus far from this board.
With this work-around, I definitely am ought to be able to extract the information I am after (plus: I learned a new technique, how to access values used for plotting).
However, due to the necessity to not suppress ODS-output, I fear performance issues. Is there a way to make the print as "light weighted" as possible (I am sorry; it feels like I got given an inch, and I ask for an ell)?
Either way, my question is resolved; thank you once again.
---
Edit:
I am sorry, though in the use case, where the prediction data set contains > 150 observations, I get "truncated"; i.e., I do not get values for observations n : n > 150.
Could you please tell me, whether there is a straightforward way to resolve this issue (instead of looping)?
> Is there a way to make the print as "light weighted" as possible
I only used PROC PRINT to show you that the output data set contains a lot of information about the quantile process. You can omit the PROC PRINT. Or you can also print only the columns (or rows) that interest you.
You can also suppress all output and just create the output data set, like this:
ods exclude all;
proc quantreg
data = pro_fit
ci = none;
model
y1 = x1 x2/quantlev=fqpr(n=30); /* 2 */
conddist hr testdata(so hr hf)=pro_predict
plot = ( cdfplot );
ods output cdfplot=cdftest_pred; /* 3 */
run;
ods exclude none;
> where the prediction data set contains > 150 observations, I get "truncated"; i.e., I do not get values for observations n : n > 150.
I don't understand this question. The first five columns of the output data set should have the same number of obs as the test data. Are you talking about the variables QuantPred1-QuantPred150? I do not know how to change the number of output variables.
> Is there a way to make the print as "light weighted" as possible
Exactly what I was looking for - thank you so much
(Thank you; I was aware that the last print was merely due to showcase the result data set; I did want to suppress the cdf-plot statements, which you have shown me is possible via ods exclude;
> where the prediction data set contains > 150 observations, I get "truncated"; i.e., I do not get values for observations n : n > 150.
Sorry for being confusing; but, yes, indeed, you understood it correctly - I am "talking about the variables QuantPred1-QuantPred150"; SAS simply stops there (n=150) and does not provide the QuantPred-Values for the last n, such that n > 150, observations.
I shall start a new topic on this one. To wrap a loop around this one seems to be pretty clumsy.
Thank you once again.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.