Help using Base SAS procedures

How can I Correlate two binary data sets?

Reply
New Contributor
Posts: 3

How can I Correlate two binary data sets?

Hi, I have to solve a problem which is little bit confusing for me, is there anybody here who can help me? I have to correlate two binary data sets .

the problem is ,

We have 100 independent variables which have only two situations: On or Off.

We have 1000 dependent variables which also have only two situations: On or Off.

In the first experiment we see dependent variables based on our given independent setup.

In a second experiment we change the independent variables to another setup and different dependent variables go on.

Which model can be used to predict the outcome of the dependent variable based on a third given independent variable configuration and the results from the two previous experiments.

Trusted Advisor
Posts: 1,630

Re: How can I Correlate two binary data sets?

PROC CATMOD could certainly be used in this situation, but I am very skeptical that there's a good answer here with 100 independent binary variables. When you have lots of independent variables, there's bound to be some correlation between them, and this greatly inhibits your ability to get a good predicting model, and that's when you have continuous variables ... with binary variables, I think the situation would be worse. Furthermore, I can't possibly imagine how 100 independent binary variables could do a good job on predicting 1000 dependent binary variables. Who knows, maybe your data has such strong relationships that such a prediction would actually work, but as I said, I am very skeptical.

To move forward, it would certainly be a good idea if you explained in more detail about this experiment

New Contributor
Posts: 3

Re: How can I Correlate two binary data sets?

Which kind of statistical modeling should I implement? I mean PROC CATMOD is a classification model? I want to use R to implement the model. The problem is dependent binary variables are much bigger than independent variable.

Super User
Posts: 17,907

Re: How can I Correlate two binary data sets?

If you're going to use R, perhaps post in R forums?

There are quite a few that are good, but you're likely to get more relevant answers.

Trusted Advisor
Posts: 1,630

Re: How can I Correlate two binary data sets?

Which kind of statistical modeling should I implement?

My point is that the design of this study may prevent you from finding a well-fitting model, and you might want to reconsider the design.

I want to use R to implement the model.

R or SAS, the model seems to be much less important at this time than getting the right design.

But why are you asking about this in a SAS forum?

New Contributor
Posts: 3

Re: How can I Correlate two binary data sets?

Because I think here people are very good in statistics.

I do not understand what you mean by right design ?

Trusted Advisor
Posts: 1,630

Re: How can I Correlate two binary data sets?

100 independent binary variables cannot possibly span the space of interest unless you have 2**100 data points in your design, and if you have fewer points, there is a major likelihood that your independent variables will be correlated with one another, thus dramatically increasing the mean square error of your parameter estimates.

As an alternative, you would need some sort of major fractional factorial design just to ensure your estimates are balanced and not correlated with each other.

But why do you need 100 independent binary variables? And can you really vary 100 independent binary variables in your study?

And how do you expect 100 independent binary variables to predict 1000 dependent binary variables? Are these dependent variables all highly correlated with one another? If so, then this might work, but do you know if the dependent variables are correlated with each other? (An example of non-binary dependent variables that are highly correlated are spectra, and so in this case you could possibly predict 1000 dependent variables using 100 independent variables)

But anyway, without more details, it seems like your study is: collect huge amounts of data, throw it into SAS and see what the results are; I think there are better ways to go about this.

Ask a Question
Discussion stats
  • 6 replies
  • 337 views
  • 6 likes
  • 3 in conversation