Help using Base SAS procedures

Using Proc distance with a very sparse matrix

New Contributor
Posts: 3

Using Proc distance with a very sparse matrix

Hello all,

I have a large sparse data set and I would like to apply segmentation of my customers. To give you an idea, I have more than 100 variables and 2.2 mln rows. Breakdown of my variables are as follows:

  • 11 nominal
  • 96 continuous (majority of which are just integers like 1,2,3,etc..)
  • 8 binary

Since my data is sparse, I would like to use a density based approach for clustering my data. I expect that the shape of each cluster would be different. So a bit of research revealed that I should be using PROC MODECLUS but given the sparsity of the data, I need to use PROC DISTANCE to attain a distance measure to my data. The data covers the product and services that a customer gets and missing values indicate that this customer didnt receive any of those services. I would like to obtain a better clustering than the one purely looks at having this product or not. (I mean i dont want to have 18 different clusters identified by the each of these 18 products)

So my question is, under these circumstances, what options I should be choosing in PROC distance to get a nice clustering in PROC MODECLUS? I have tried NOSTD option with MISSING in the variables but it didnt give me anything credible.

Thanks a lot in advance


Ask a Question
Discussion stats
  • 0 replies
  • 1 in conversation