turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-08-2011 02:18 PM

Hi all,

I have a large dataset, having 8 million observations. I have fitted a logistic model and obtained the predicted probabilities using PROC LOGISTIC. Now using the predicted probabilities, I would like to generate the 0-1 values corresponding to each observation. I was trying to do that in PROC IML using the RANDGEN function within a loop. Rick Wicklin's blog (April 4, 2011) suggest similar solution for independent normal distribution. But it's taking for ever. When I do the same thing for a subset (2 million) of the dataset, it works reasonable fast (overnight). Is there a better way to this? Note that the predicted probabilities are different for each observation.

Similar problem occurs when I try to generate from normal with different mean in PROC IML. Any guidance would be highly appreciated.

Santanu

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-08-2011 02:44 PM

Maybe I'm confused. I think all you have to do to get the groups is to assign group=1 when the predicted value is greater than 0.5 and group=0 when the predicted value is less than 0.5. I don't see why RANDGEN comes into play or why there would be a loop. Here is some code that generates some fake data and calls logistic to output predicted probabilities. The PROC IML code just assigns 1 or 0 depending on the predicted probabilities. You can use the DATA step to do the same thing.

data a(drop = i prob);

call streaminit(321);

do i = 1 to 1000;

x = rand("normal");

prob = exp(x) / (1 + exp(x));

y = rand("Bernoulli", 1-prob);

output;

end;

proc logistic data=a;

model y(event='1') = x;

output out=out pred=pred;

run;

proc iml;

use out; read all var {pred y}; close out;

class = (pred>= 0.5);

print (sum(class=y));

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-08-2011 02:59 PM

Hi Rick,

Thanks for your reply. What if all the predicted probabilities are greater than 0.5 or less than 0.5? Still there is a chance of of an observation getting assigned to a different group, right? I would like to incorporate that uncertainty by generating from Bernoulli.

This is similar to that of fitting a model using PROC MIXED. I can get the predicted (EBLUP) values from PROC MIXED, but those are unrealistically smooth values. I would like to obtain the predicted values in two steps: first generate a value (say, mu) from normal with mean=synthetic (Xbeta_hat) and common variance=random effect variance component estimate, then generate from normal with mean=mu and common variane=residual variance estimate.

Santanu

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-14-2011 02:19 PM

Hi Santanu,

You can generate a random uniform for each observation (U), then set OUTCOME = 1 if prob > U, 0 otherwise.

--Susan