BookmarkSubscribeRSS Feed
azybanoo
Calcite | Level 5

Hi, I have to solve a problem which is little bit confusing for me, is there anybody here who can help me? I have to correlate two binary data sets .

the problem is ,

We have 100 independent variables which have only two situations: On or Off.

We have 1000 dependent variables which also have only two situations: On or Off.

In the first experiment we see dependent variables based on our given independent setup.

In a second experiment we change the independent variables to another setup and different dependent variables go on.

Which model can be used to predict the outcome of the dependent variable based on a third given independent variable configuration and the results from the two previous experiments.

6 REPLIES 6
PaigeMiller
Diamond | Level 26

PROC CATMOD could certainly be used in this situation, but I am very skeptical that there's a good answer here with 100 independent binary variables. When you have lots of independent variables, there's bound to be some correlation between them, and this greatly inhibits your ability to get a good predicting model, and that's when you have continuous variables ... with binary variables, I think the situation would be worse. Furthermore, I can't possibly imagine how 100 independent binary variables could do a good job on predicting 1000 dependent binary variables. Who knows, maybe your data has such strong relationships that such a prediction would actually work, but as I said, I am very skeptical.

To move forward, it would certainly be a good idea if you explained in more detail about this experiment

--
Paige Miller
azybanoo
Calcite | Level 5

Which kind of statistical modeling should I implement? I mean PROC CATMOD is a classification model? I want to use R to implement the model. The problem is dependent binary variables are much bigger than independent variable.

Reeza
Super User

If you're going to use R, perhaps post in R forums?

There are quite a few that are good, but you're likely to get more relevant answers.

PaigeMiller
Diamond | Level 26

Which kind of statistical modeling should I implement?

My point is that the design of this study may prevent you from finding a well-fitting model, and you might want to reconsider the design.

I want to use R to implement the model.

R or SAS, the model seems to be much less important at this time than getting the right design.

But why are you asking about this in a SAS forum?

--
Paige Miller
azybanoo
Calcite | Level 5

Because I think here people are very good in statistics.

I do not understand what you mean by right design ?

PaigeMiller
Diamond | Level 26

100 independent binary variables cannot possibly span the space of interest unless you have 2**100 data points in your design, and if you have fewer points, there is a major likelihood that your independent variables will be correlated with one another, thus dramatically increasing the mean square error of your parameter estimates.

As an alternative, you would need some sort of major fractional factorial design just to ensure your estimates are balanced and not correlated with each other.

But why do you need 100 independent binary variables? And can you really vary 100 independent binary variables in your study?

And how do you expect 100 independent binary variables to predict 1000 dependent binary variables? Are these dependent variables all highly correlated with one another? If so, then this might work, but do you know if the dependent variables are correlated with each other? (An example of non-binary dependent variables that are highly correlated are spectra, and so in this case you could possibly predict 1000 dependent variables using 100 independent variables)

But anyway, without more details, it seems like your study is: collect huge amounts of data, throw it into SAS and see what the results are; I think there are better ways to go about this.

--
Paige Miller

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1039 views
  • 6 likes
  • 3 in conversation