BookmarkSubscribeRSS Feed
karaheller20
Calcite | Level 5

SO, the data I have currently is just with the data that is a yes. I have a few variables to predict it off of, but I need to come up with a way to predict whether it is a yes or a no based off the dataset that is just a yes. I have another dataset with the same variables but do not know if they are yes or no, so ideally I would like to apply the model I find in the dataset that is only yes, to this data set to determine if they are yes or no. Please let me know if this does not make sense!

 

Thoughts, in the past I have run detailed logistic regression but it was on a dataset that had both yes and no. This has me thinking that I cannot use logistic regression for this because the dataset I have to use to determine the results is just a yes. 

 

I am using SAS enterprise guide

3 REPLIES 3
PaigeMiller
Diamond | Level 26

I believe you are correct that you cannot do any sort of logistic regression (or other statistical modeling) on a data set that contains only one level of your response variable.

 

A possibility is to perform Principal Components on your X-variables, and then apply these Principal Components to score the new data set. Any record that has a high T-squared value or a high QRES (sometimes called STDXSSE) would be different than the X-variables (in other words, outside the space of the x-variables in layman's terms) would be a candidate to be a "NO" response, but even there you'd be guessing, because being outside the space of the x-variables doesn't really mean that they are a "NO" response, it means they are outside the space of the X-variables.

 

So maybe that's not a great answer either. The problem isn't the answer I just gave, the problem is your data.

 

Note: PROC PRINCOMP does not compute T-squared or QRES directly.

--
Paige Miller
karaheller20
Calcite | Level 5

Thank you for giving it your best shot. I know the problem is definitely my data. I just didn't know if someone else had any out of the box modeling suggestions.

Reeza
Super User

There are no methodological method I know to do this without any data that would indicate what a 'No' would look like. 

 

You could do distance measures between the observations and then pick any that are far away in terms of distance, though as I type this, I think that's likely mathematically the same as Principal component analysis. The next step would be to do this for a bunch of different observations and make decisions and see how it works and making cutoff rules but you have no data to do that. 

 

What I would start off with is by taking the two samples of data and seeing if they're different across all the variables. Then I would probably start with those variables as my method of differentiating the groups.

 

This is an example of how that would work and would run a t-test on all the numeric variables across the data to see which are different. If the values are not different between the data sets they likely cannot differentiate between Yes and No because it's all the same as the Yes datasets. This could be misleading if you have a small event rate for No. 

 

data have;
set GroupA GroupAB indsname = source;

dsn = source;

run;


proc ttest data=have;
class dsn;
var _numeric_;
run;

@karaheller20 wrote:

SO, the data I have currently is just with the data that is a yes. I have a few variables to predict it off of, but I need to come up with a way to predict whether it is a yes or a no based off the dataset that is just a yes. I have another dataset with the same variables but do not know if they are yes or no, so ideally I would like to apply the model I find in the dataset that is only yes, to this data set to determine if they are yes or no. Please let me know if this does not make sense!

 

Thoughts, in the past I have run detailed logistic regression but it was on a dataset that had both yes and no. This has me thinking that I cannot use logistic regression for this because the dataset I have to use to determine the results is just a yes. 

 

I am using SAS enterprise guide


 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 390 views
  • 0 likes
  • 3 in conversation