on the wake of my eightieth birthday, I led a bit forward the solution of an old obsession of mine: substantive clustering, that is finding clusters when they naturally are in the data body. The problem started boiling in my mind when I was very young (see attachment 1) because all Euclidean Space based algorithms are amenable to a mechanical interpretation: the moment of inertia of the data body. The moment of inertia makes you to think to a "mechanics-style" framing of the problem. For many years I tried to find substantive clusters searching for potential wells in a gravitational field dictated by the data themselves as if points were asteroids. I already communicated something about to this community. However, I didn't fully succeed in my search because of "black holes" because too near points are endowed with an unlimited gravitational force. The current state of my research efforts shows that using a weighted Convolution of the sample's Density with the Laplace Distribution one can have a kind of Gravitational Field without singularities, the infamous black holes. You find details in (2).
Hope somebody will continue my work because I have no hope to find a better solution. For me, the problem is closed. I'm now trying to address a different problem. Following Lancaster, I want to provide users with an improved Conjont Analysis Model where Price is NOT a Conjoint factor. My experience in some hundred cases so far shows that Products seem anelastic when price is considered a Conjoint Factor.
I'm going to publish this result also within the ASA Community and within some Linkedin groups
You probably should indicate just what TLA (three-letter-acronym) ASA community you mean if it has any bearing on the topic.
I doubt that it is the first one that came to my mind (Army Security Agency).
Happy Retirement and thank you for your years of research!
I have only scanned the papers, but the first (Capra et al., 1976) appears to be similar to what is now known as the expectation-maximization (EM) method for homogeneous clusters. The second appears to be related to the Gaussian mixture model (GMM) with EM selection. I have not studied the papers well enough to discern subtle differences, and please excuse me if I am mistaken. I think one distance is that the OP's paper uses exp( -|x_i - b_i| ) as a criterion (b_i is a barycenter) whereas the GMM uses exp( -(x_i - b_i)^2 ). In practice, small differences like this are often ignorable.
I have written about the EM method at a high level. The method of clustering that you describe seems similar to the model-based clustering that is implemented in PROC MBC. The MBC algorithm supports isotropic as well as anisotrophic models, which is a fancy way of saying that you can use it when the clusters have the same covariance structure or not.
The Getting Started example for PROC MCB provides a 2-D example of model-based clustering.
PROC MBC requires you to specify a list of clusters by using the NCLUSTERS= options. However, you can specify a list of clusters (such as (2 to 10)) and the procedure will use a model-based criterion to choose the number of clusters that best fits the density of the data.
the first paper was written when the K-means method was not yet widely known and, except for the mechanical interpretation, was probably not so different from. You can see that the focus is in the efficiency of the algorithm due to resource scarcity at the time. The second paper is an attempt to avoid predefining the number of clusters. Their number is found studying the potential wells of a gravitational field where the point inter-distance is a negative exponential of a Manhattan distance. My previous attempts with a plain Newtonian force was hindered by the "black hole" phenomenon due to points so near that the gravitational force becomes too great making them to start drifting through the space.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.