BookmarkSubscribeRSS Feed
spramanik
Calcite | Level 5

Hi all,

I have a large dataset, having 8 million observations. I have fitted a logistic model and obtained the predicted probabilities using PROC LOGISTIC. Now using the predicted probabilities, I would like to generate the 0-1 values corresponding to each observation. I was trying to do that in PROC IML using the RANDGEN function within a loop. Rick Wicklin's blog (April 4, 2011) suggest similar solution for independent normal distribution. But it's taking for ever. When I do the same thing for a subset (2 million) of the dataset, it works reasonable fast (overnight). Is there a better way to this? Note that the predicted probabilities are different for each observation.

Similar problem occurs when I try to generate from normal with different mean in PROC IML. Any guidance would be highly appreciated.

Santanu

3 REPLIES 3
Rick_SAS
SAS Super FREQ

Maybe I'm confused. I think all you have to do to get the groups is to assign group=1 when the predicted value is greater than 0.5 and group=0 when the predicted value is less than 0.5.  I don't see why RANDGEN comes into play or why there would be a loop.  Here is some code that generates some fake data and calls logistic to output predicted probabilities. The PROC IML code just assigns 1 or 0 depending on the predicted probabilities. You can use the DATA step to do the same thing.

data a(drop = i prob);
call streaminit(321);
do i = 1 to 1000;
   x = rand("normal");
   prob = exp(x) / (1 + exp(x));
   y = rand("Bernoulli", 1-prob);
   output;
end;
proc logistic data=a;
model y(event='1') = x;
output out=out pred=pred;
run;

proc iml;
use out; read all var {pred y}; close out;
class = (pred>= 0.5);
print (sum(class=y));

spramanik
Calcite | Level 5

Hi Rick,

Thanks for your reply. What if all the predicted probabilities are greater than 0.5 or less than 0.5? Still there is a chance of of an observation getting assigned to a different group, right? I would like to incorporate that uncertainty by generating from Bernoulli.

This is similar to that of fitting a model using PROC MIXED. I can get the predicted (EBLUP) values from PROC MIXED, but those are unrealistically smooth values. I would like to obtain the predicted values in two steps: first generate a value (say, mu) from normal with mean=synthetic (Xbeta_hat) and common variance=random effect variance component estimate, then generate from normal with mean=mu and common variane=residual variance estimate.

Santanu

sgruber
Calcite | Level 5

Hi Santanu,

You can generate a random uniform for each observation (U), then set OUTCOME = 1 if prob > U, 0 otherwise.

--Susan

sas-innovate-white.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Early bird rate extended! Save $200 when you sign up by March 31.

Register now!

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 3 replies
  • 1323 views
  • 0 likes
  • 3 in conversation