Statistical Procedures

Bootsy1 · Posted 04-16-2021 10:28 AM

I'm attaching a document where a new promising strategy for finding the number of clusters is explained along with a sample commented SAS code.

Grateful for any comments

Ulderico Santarelli

PaigeMiller · Posted 04-16-2021 10:37 AM

Many of us will not (or cannot) download Microsoft Office documents because they are security threats. Can you provide a link to a web page that has this information?

--
Paige Miller

Bootsy1 · Posted 04-16-2021 05:25 PM

I uploaded both documents into Google Drive. Here they are

https://drive.google.com/file/d/19maSnXSdXtql61tshGGLnvwgvF6W8c04/view?usp=sharing

https://drive.google.com/file/d/1lJd9f96w_J41BmvW8CQWNTZ5gPph4Q9P/view?usp=sharing

thank you for your interest.

Ulderico.

PaigeMiller · Posted 04-16-2021 05:45 PM

Access denied.

And if they are Word (or other Microsoft Office) documents on Google Drive, I still won't download them.

--
Paige Miller

Bootsy1 · Posted 04-16-2021 06:49 PM

here are the links for pdf documents.

https://drive.google.com/file/d/14gt5AjNdmyAwKz5Tul00RIrZy0O3n1-S/view?usp=sharing

https://drive.google.com/file/d/1ReFpJUGSAzz2xPVxbRjtYa-HxkKAPupz/view?usp=sharing

they should be virus free. I'm using Malware software that seems very powerful.

Ulderico.

PaigeMiller · Posted 04-17-2021 07:03 AM

@Bootsy1 wrote:

here are the links for pdf documents.

https://drive.google.com/file/d/14gt5AjNdmyAwKz5Tul00RIrZy0O3n1-S/view?usp=sharing

https://drive.google.com/file/d/1ReFpJUGSAzz2xPVxbRjtYa-HxkKAPupz/view?usp=sharing

Access denied

they should be virus free. I'm using Malware software that seems very powerful.

With regards to computer security (of my computer), why should I believe you? However, PDF is an acceptable form of document, but I still can't access it.

--
Paige Miller

Bootsy1 · Posted 04-19-2021 08:38 PM

I'm going to upload pdf docs in the Community's workspace

You sould be able to get them right away

Ksharp · Posted 04-17-2021 06:55 AM

There is not an right answer in the world for this question.

But you could check CCC option of PROC CLUSTER

or use Principle Component Analysis to check it by plot the first two principle component.

@Rick_SAS wrote a blog about it for race and blood relationship .

Bootsy1 · Posted 04-17-2021 06:35 PM

I find that the main challenges of Clustering are two:

1. one acts on a sample. This entails monumental consequences. Different samples share no points with probability almost 1. So that you can never claim replicability in clustering if you follow any of the many extant algorithms that go on sequentially. Only if you act on "central points", actually local means, you can claim replicability.

2. sequential methods reach a solution, of course. However, you never know how much the solution is far form the optimal one.

Going parallel has two advantages:

1. you find "central points", that is points that have many surrounding ones so that they don't move during iterations. Central points are local means that have a surrounding subsample, aka cluster. This makes their standard error to be much less than the standard deviation that measures the variability of single points. So that, if you follow the "any point is good" approach, where all points are equivalent, you are exposed to the variability of sigma, while if you act on "central points", actually local means, you face a much smaller variability, actually a fraction of the sample standard deviation. That's why central points remain stable during iterations

2. you avoid the worst of sequential problems, where the solution varies with the point you start from. Because you act on points that have a high variability, the first point decides which one the solution will be.

In my opinion, the method should be parallel.

Bootsy1 · Posted 08-12-2021 06:39 PM

continuing my research about the the number of clusters, I found that the gravitational approach can be better described as in the attachment. The Gravitational Force Field accurately describes where the mass is laud in the space so that it is possible to view the gravitational force as an indicator of the local distribution density.

Comments are very welcome.

Ulderico.

Statistical Procedures

in search of the number of clusters

Re: in search of the number of clusters

Re: in search of the number of clusters

Re: in search of the number of clusters

Re: in search of the number of clusters

Re: in search of the number of clusters

Re: in search of the number of clusters

Re: in search of the number of clusters

Re: in search of the number of clusters

Re: in search of the number of clusters

Searching in SAS Anti-Money Laundering: Form Searches and Advanced Sea...

Searching in SAS Anti-Money Laundering: Global Search

SAS Viya 2024.11 OIDC and SAML Simplify Group Search

Model-Based Clustering (Part 1): Exploring Its Significance

[ML] Clustering: K-Means ++

Follow Us

What is...

Statistical Procedures

Join us for our biggest event of the year!

Follow Us

What is...