BookmarkSubscribeRSS Feed
spramanik
Calcite | Level 5

Hi all,

I have a large dataset, having 8 million observations. I have fitted a logistic model and obtained the predicted probabilities using PROC LOGISTIC. Now using the predicted probabilities, I would like to generate the 0-1 values corresponding to each observation. I was trying to do that in PROC IML using the RANDGEN function within a loop. Rick Wicklin's blog (April 4, 2011) suggest similar solution for independent normal distribution. But it's taking for ever. When I do the same thing for a subset (2 million) of the dataset, it works reasonable fast (overnight). Is there a better way to this? Note that the predicted probabilities are different for each observation.

Similar problem occurs when I try to generate from normal with different mean in PROC IML. Any guidance would be highly appreciated.

Santanu

3 REPLIES 3
Rick_SAS
SAS Super FREQ

Maybe I'm confused. I think all you have to do to get the groups is to assign group=1 when the predicted value is greater than 0.5 and group=0 when the predicted value is less than 0.5.  I don't see why RANDGEN comes into play or why there would be a loop.  Here is some code that generates some fake data and calls logistic to output predicted probabilities. The PROC IML code just assigns 1 or 0 depending on the predicted probabilities. You can use the DATA step to do the same thing.

data a(drop = i prob);
call streaminit(321);
do i = 1 to 1000;
   x = rand("normal");
   prob = exp(x) / (1 + exp(x));
   y = rand("Bernoulli", 1-prob);
   output;
end;
proc logistic data=a;
model y(event='1') = x;
output out=out pred=pred;
run;

proc iml;
use out; read all var {pred y}; close out;
class = (pred>= 0.5);
print (sum(class=y));

spramanik
Calcite | Level 5

Hi Rick,

Thanks for your reply. What if all the predicted probabilities are greater than 0.5 or less than 0.5? Still there is a chance of of an observation getting assigned to a different group, right? I would like to incorporate that uncertainty by generating from Bernoulli.

This is similar to that of fitting a model using PROC MIXED. I can get the predicted (EBLUP) values from PROC MIXED, but those are unrealistically smooth values. I would like to obtain the predicted values in two steps: first generate a value (say, mu) from normal with mean=synthetic (Xbeta_hat) and common variance=random effect variance component estimate, then generate from normal with mean=mu and common variane=residual variance estimate.

Santanu

sgruber
Calcite | Level 5

Hi Santanu,

You can generate a random uniform for each observation (U), then set OUTCOME = 1 if prob > U, 0 otherwise.

--Susan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 3 replies
  • 820 views
  • 0 likes
  • 3 in conversation