BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sinistrum
Quartz | Level 8

Dear community,

 

I would please like to ask for your help concerning the following issue.

 

As "proc quanrteg" offers the opportunity to estimate conditional cumulative distribution functions (:= CDFs),

instead of plotting a graph, I would please like to know, how to obtain the predictions for arbitrary values on the plot's abscissa, i.e., nothing else but predicted values for the CDF(x),

 

Example:

1. First, I would like to use "proc quantreg" to fit the model. Say, on a sample of N_1 = 1000 observations, with dependent variable y and independent variables x1 and x2.

2. Second, I would like to use the fitted model from step one, to get estimated for a new sample of N_2 = 100 new observations, for which I observe x_1 and x_2;

assume, I am interested, for each observation, in the estimated value for CDF(7) (i.e., the probability, that y is smaller-or-equal to -1) , CDF(11), and CDF(13).

 

The beginnings for the example follow.

 

I would be very glad if you were to provide help for this question of mine, please.

 

Yours sincerely,

Sinistrum

 

data	pro_fit;
	seed 	=	1;
	call	streaminit(seed);
	do		i	=	1	to		1000;
		x1	= 	rand ('normal',	10,	2);
		x2	=	rand ('normal',	5,	1);
		y1	=	1	+	2 * x1 + 5 * x2 + rand ('normal',	0,	0.5);
	output;
	end;
	drop seed;
run;
data	pro_predict;
	seed 	=	2;
	call	streaminit(seed);
	do		i	=	1	to		100;
		x1	= 	rand ('normal',	10,	2);
		x2	=	rand ('normal',	5,	1);
	output;
	end;
	drop seed;
run;
proc	quantreg
	data	=	pro_fit
	ci		=		none;
	model
		y1 	= x1 x2;
	conddist  
		plot	=		(
							cdfplot
							pdfplot
						)
					;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I discussed this with a colleague, who suggested that you

  1. look at the TESTDATA= option on the CONDIST statement. To use it, you must have a valid response (y1) value in the "pro_predict" data set.
  2. Use the fast quantile process (FQPR) option on the QUANTLEV= option on the MODEL statement 
  3. Use ODS OUTPUT to capture the data that is contained in the conditional CDF plot.

Try the following:

data  pro_fit;
      seed =     1;
      call  streaminit(seed);
      do          i     =     1     to          1000;
            x1    =     rand ('normal',   10,   2);
            x2    =     rand ('normal',   5,    1);
            y1    =     1     +     2 * x1 + 5 * x2 + rand ('normal',   0,    0.5);
      output;
      end;
      drop seed;
run;
data  pro_predict;
      seed =     2;
      call  streaminit(seed);
      do          i     =     1     to          100;
            x1    =     rand ('normal',   10,   2);
            x2    =     rand ('normal',   5,    1);
            y1= 50;   /* 1. Need a valid response for testdata. */
      output;
      end;
      drop seed;
run;
ods graphics on;

ods output cdfplot=cdftest_pred; /* 3 */
proc  quantreg
      data  =     pro_fit
      ci          =           none;
      model
            y1    = x1 x2/quantlev=fqpr(n=30); /* 2 */
      conddist hr  testdata(so hr hf)=pro_predict 
            plot  =           ( cdfplot );
run;

proc print data= cdftest_pred;
run;

View solution in original post

4 REPLIES 4
Rick_SAS
SAS Super FREQ

I discussed this with a colleague, who suggested that you

  1. look at the TESTDATA= option on the CONDIST statement. To use it, you must have a valid response (y1) value in the "pro_predict" data set.
  2. Use the fast quantile process (FQPR) option on the QUANTLEV= option on the MODEL statement 
  3. Use ODS OUTPUT to capture the data that is contained in the conditional CDF plot.

Try the following:

data  pro_fit;
      seed =     1;
      call  streaminit(seed);
      do          i     =     1     to          1000;
            x1    =     rand ('normal',   10,   2);
            x2    =     rand ('normal',   5,    1);
            y1    =     1     +     2 * x1 + 5 * x2 + rand ('normal',   0,    0.5);
      output;
      end;
      drop seed;
run;
data  pro_predict;
      seed =     2;
      call  streaminit(seed);
      do          i     =     1     to          100;
            x1    =     rand ('normal',   10,   2);
            x2    =     rand ('normal',   5,    1);
            y1= 50;   /* 1. Need a valid response for testdata. */
      output;
      end;
      drop seed;
run;
ods graphics on;

ods output cdfplot=cdftest_pred; /* 3 */
proc  quantreg
      data  =     pro_fit
      ci          =           none;
      model
            y1    = x1 x2/quantlev=fqpr(n=30); /* 2 */
      conddist hr  testdata(so hr hf)=pro_predict 
            plot  =           ( cdfplot );
run;

proc print data= cdftest_pred;
run;
Sinistrum
Quartz | Level 8

Thank you very much indeed. Once more, this is so enormous - breathtaking, the help I received thus far from this board.

 

With this work-around, I definitely am ought to be able to extract the information I am after (plus: I learned a new technique, how to access values used for plotting).

However, due to the necessity to not suppress ODS-output, I fear performance issues. Is there a way to make the print as "light weighted" as possible (I am sorry; it feels like I got given an inch, and I ask for an ell)?

 

Either way, my question is resolved; thank you once again.

 

---

 

Edit:

I am sorry, though in the use case, where the prediction data set contains > 150 observations, I get "truncated"; i.e., I do not get values for observations n : n > 150.

Could you please tell me, whether there is a straightforward way to resolve this issue (instead of looping)?

Rick_SAS
SAS Super FREQ

> Is there a way to make the print as "light weighted" as possible

I only used PROC PRINT to show you that the output data set contains a lot of information about the quantile process. You can omit the PROC PRINT. Or you can also print only the columns (or rows) that interest you.

You can also suppress all output and just create the output data set, like this:

ods exclude all;
proc  quantreg
      data  =     pro_fit
      ci          =           none;
      model
            y1    = x1 x2/quantlev=fqpr(n=30); /* 2 */
      conddist hr  testdata(so hr hf)=pro_predict 
            plot  =           ( cdfplot );
      ods output cdfplot=cdftest_pred; /* 3 */
run;
ods exclude none;

> where the prediction data set contains > 150 observations, I get "truncated"; i.e., I do not get values for observations n : n > 150.

 

I don't understand this question. The first five columns of the output data set should have the same number of obs as the test data. Are you talking about the variables QuantPred1-QuantPred150?  I do not know how to change the number of output variables.

Sinistrum
Quartz | Level 8

> Is there a way to make the print as "light weighted" as possible

Exactly what I was looking for - thank you so much

(Thank you; I was aware that the last print was merely due to showcase the result data set; I did want to suppress the cdf-plot statements, which you have shown me is possible via ods exclude;

 

> where the prediction data set contains > 150 observations, I get "truncated"; i.e., I do not get values for observations n : n > 150.

Sorry for being confusing; but, yes, indeed, you understood it correctly - I am "talking about the variables QuantPred1-QuantPred150"; SAS simply stops there (n=150) and does not provide the QuantPred-Values for the last n, such that n > 150, observations.

I shall start a new topic on this one. To wrap a loop around this one seems to be pretty clumsy.

 

Thank you once again.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 703 views
  • 3 likes
  • 2 in conversation