turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Re: Questions about "Number of Support Vectors" in...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-15-2018 04:43 AM

Hello everyone,

I just used the HP SVM node in SAS EM to build SVM model and it turned out to perform pretty well. But what is confusing to me is: In the output of SVM model , I see following information:

```
Training Results
Inner Product of Weights 72.2366916
Bias 5.27294747
Total Slack (Constraint Violations) 2962.97082
Norm of Longest Vector 12.6467077
Number of Support Vectors 69998
Number of Support Vectors on Margin 0
Maximum F 24.7895353
Minimum F 0.04925285
Number of Effects 18
Columns in Data Matrix 18
Columns in Kernel Matrix 190
```

(1) I don't know why it reports that "Number of Support Vectors" is 69998, which is just the size of my training dataset. That is obviously impossible that the model uses all observations in training dataset as support vectors since the AUC on training dataset is NOT equal to 1 and there is very slight overfitting situation.

(2) By the way, is there anyone who could tell me, that how can I see the REAL number of support vectors that the model uses?

Thanks very much.

Accepted Solutions

Solution

01-23-2018
05:09 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to YG1992

01-18-2018 09:45 PM

Here are my answers. Hope they make sense and are helpful.

1. Basically a good model should be having relatively accurate prediction without much overfitting. To see the accuracy values, you can look at the misclassification table and the fit statistics table from the training output. The training result table does not contain the training accuracy information. Though you can get some other valuable information from the table such as "Inner Product of Weights", "Constraint Violations", "Number of Support Vectors", and so on. To see a model is overfitting or not, you can score the model with the validating dataset and compare the accuracies with that of the training dataset. If the validating accuracy is pretty close to the training accuracy, then there is no overfitting. Otherwise the overfitting might exist. Assume there is no overfitting, then the models can be compared through their ROCs.

2. For the linear-kernel SVM model, the chance of overfitting is relatively low. If the overfitting does happen. Then try to adjust the penalty value to get a different model.

3. The default technique for the SVM node is the interior-point method. This method supports multiple threads and distributed computation. While the SMO algorithm is sequential and single threaded. For relatively large dataset problem, the interior-point method runs much fast than the SMO method. For small data problems, you can also select the active-set method from the SVM node. In this case, the non-linear kernel can be used to obtain a higher accuracy model. At the same time, you have a higher risk to encounter the overfitting issue. By the way, the SMO method is a special case of the active-set method.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to YG1992

01-16-2018 04:52 PM

From the training result table I can see that you selected the polynomial kernel with degree=2. In this case, the expanded number of variables is 190. It is possible that all the observations are reported as support vectors. It is good to learn that you get a pretty good model. At the same time, the model could be a little bit overfitting. There are several ways you can adjust the model.

1. Do a data partition and validate the model through the validating dataset.

2. Build a model with linear kernel instead of the polynomial kernel.

3. Try a different penalty value.

For the second question, unfortunately the real support vectors are not reported.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to taiphe

01-17-2018 03:37 AM

Hi taiphe,

Thanks very much for your quick reply and I found them helpful to me. According to your feedback I have some further questions as following:

1. How did you make the judgment that I have trained a good model? In other words, how do you read the training result table? Which statistics are useful? How would you suggest to analyze the results? (I ask this question since I usually add a "comparison node" after each model and tend to ONLY put emphasis on training ROC and validating ROC)

2. I also included a linear-kernel-SVM model. What if there is also some overfitting for this model?

3. According to the official brochure of SAS EM HP Procedure, the HP SVM node uses primal-dual interior point method as default to solve the quadratic programming problem. Why didn't you choose sequential minimal optimization (SMO) algorithm? What do you think the advantages and disadvantages of both algorithms?

I know that it may take some time to answer questions above - especially for the 3rd one - so please take your time. As a fresh man in the field machine learning, I always would like to hear some really valuable and useful opinions about different models/methods.

Thank you very much!

Solution

01-23-2018
05:09 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to YG1992

01-18-2018 09:45 PM

Here are my answers. Hope they make sense and are helpful.

1. Basically a good model should be having relatively accurate prediction without much overfitting. To see the accuracy values, you can look at the misclassification table and the fit statistics table from the training output. The training result table does not contain the training accuracy information. Though you can get some other valuable information from the table such as "Inner Product of Weights", "Constraint Violations", "Number of Support Vectors", and so on. To see a model is overfitting or not, you can score the model with the validating dataset and compare the accuracies with that of the training dataset. If the validating accuracy is pretty close to the training accuracy, then there is no overfitting. Otherwise the overfitting might exist. Assume there is no overfitting, then the models can be compared through their ROCs.

2. For the linear-kernel SVM model, the chance of overfitting is relatively low. If the overfitting does happen. Then try to adjust the penalty value to get a different model.

3. The default technique for the SVM node is the interior-point method. This method supports multiple threads and distributed computation. While the SMO algorithm is sequential and single threaded. For relatively large dataset problem, the interior-point method runs much fast than the SMO method. For small data problems, you can also select the active-set method from the SVM node. In this case, the non-linear kernel can be used to obtain a higher accuracy model. At the same time, you have a higher risk to encounter the overfitting issue. By the way, the SMO method is a special case of the active-set method.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to taiphe

01-19-2018 03:40 AM

Thanks very much! Your answers are really helpful especially with mentioning some key words, which from my point of view will be helpful to my future study. Wish you a happy weekend!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to YG1992

01-19-2018 12:14 PM

Hi @YG1992,

I'm glad you found some useful info! If one of @taiphe's replies was the exact solution to your problem, can you "Accept it as a solution"? Or if one was particularly helpful - which they both seem like they were, feel free to "Like" one or both. This will help other community members who may run into the same issue know what worked.

Thanks!

Anna

Highlighted
## Re: Questions about "Number of Support Vectors" in SVM model in SAS EM

Options

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to AnnaBrown

01-23-2018 05:10 AM

Hi Anna,

Already done. Thanks for your work.