I am learning sas and totally new to this field. I have 2 doubts:
1. I have to do hierarchical clustering for a dataset. How can i do hierarchical clustering using 2 dataset out of which one is demographic data and other is survey data.
2. I was trying to write a code to do clustering for survey data. Here is the code:
data a1;
infile "/folders/myfolders/pda.csv" dlm=',' firstobs=2;
input ID $ Innovator $ Use_message $ Use_cell $ Use_PIM $ Inf_passive $ INF_active $ remote_access $ Share_info $ Monitor $ Email $ Web $ M_media $ ergonomic $ monthly price;
run;
proc cluster simple noeigen method=centroid rmsstd rsquare nonorm outtree=a2;
id ID Innovator Use_message Use_cell Use_PIM Inf_passive INF_active remote_access Share_info Monitor Email Web M_media ergonomic;
var price monthly;
run;
But it is showing this again and again.
Your ID statement is incorrect.
Looking at the documentation, it states that it requires a single variable, not variable(s), and is used to identify a record. It should be unique for each observation. I'm not sure what you're trying to do with this, so can't make suggestions beyond the correction.
ID variable;
The values of the ID variable identify observations in the displayed cluster history and in the OUTTREE= data set. If the ID statement is omitted, each observation is denoted by OBn
, where n is the observation number.
I have to Perform hierarchical cluster analysis on the PDA data. As all the other variables are categorical can i keep it in var statement?
I am attaching the description herewith.
@ashmishah wrote:
I am learning sas and totally new to this field. I have 2 doubts:
1. I have to do hierarchical clustering for a dataset. How can i do hierarchical clustering using 2 dataset out of which one is demographic data and other is survey data.
2. I was trying to write a code to do clustering for survey data. Here is the code:
data a1;
infile "/folders/myfolders/pda.csv" dlm=',' firstobs=2;
input ID $ Innovator $ Use_message $ Use_cell $ Use_PIM $ Inf_passive $ INF_active $ remote_access $ Share_info $ Monitor $ Email $ Web $ M_media $ ergonomic $ monthly price;
run;proc cluster simple noeigen method=centroid rmsstd rsquare nonorm outtree=a2;
id ID Innovator Use_message Use_cell Use_PIM Inf_passive INF_active remote_access Share_info Monitor Email Web M_media ergonomic;
var price monthly;
run;
But it is showing this again and again.
proc cluster simple noeigen method=centroid rmsstd rsquare nonorm outtree=a2;63 id ID Innovator Use_message Use_cell Use_PIM Inf_passive INF_active remote_access Share_info Monitor Email Web M_media_________227663 ! ergonomic;ERROR 22-322: Expecting ;.ERROR 76-322: Syntax error, statement will be ignored.64 var price monthly;65 run;How can i proceed with it.I am attaching one of the file here.
I personally have a policy of not answering homework questions. My advice would be to work through the examples in the documentation first or one from your textbook. These should give you enough of a basis to answer your homework questions.
I noticed you have some character variables. proc cluster can not handle the character variables. Try use PROC DISTANCE to get distance matrix and feed it into proc cluster .
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.