07-13-2015 01:53 PM
I have never used PROC SIMILARITY and would like some advice from the SAS Community.I have time series data (1970-2013) for 40 states. I want to cluster these 40 states into 3-4 groups based on their time series similarities. Additionally, I would like to find a way to visually represent these derived groupings (e.g., a cluster tree). My csv file has 40 columns (each state) and 43 rows (years). The SAS example for time series clustering appears fairly incomplete and I am having trouble finding the syntax for what I want to do. Can someone provide any advice?
07-13-2015 03:21 PM
I would recommend you using the Time Series Similarity node in Enterprise Miner (if it is an option for you) which outputs the clusters and the hierarchy structure.
Of course, you can do the same thing in SAS (ETS installation required) through some coding yourself. You can obtain the similarity matrix from PROC SIMILARITY and then call PROC CLUSTER or CORR for analyzing the association and clustering. The basic syntax to meet your specific need will look like,
PROC SIMILARITY data=InputData out=OutputData outsum=OutSummary;
ID timeid interval = ... accumulate=... ;
TARGET series1 series2 ...;
You don't need BY statement if you don't have Cross ID.
You can then feed the OutSummary data as a Distance type to PROC CLUSTER.