BookmarkSubscribeRSS Feed
HDavid
Calcite | Level 5

Hello everyone,
I need your help because I am new to using SAS.
I would like to clustering on time series but after several researches I am no more advanced. That's what my data file looks like.
I have about thirty participants who have done 2 tasks
The duration of the task was 10 minutes maximum
Data was collected every 10 seconds.
The value collected is between 1 and -1
In my data sample, I have missing values and each participant has the possibility of finishing the task before the 10 min so each task does not necessarily have the same duration.

 

Id                          task                       varperform1                      varperform2                      time

01                         01                         0.10                                    0.5                                       0

01                         01                         0.22                                    0.4                                       1

…                           …                           …                                         …                                         …

…                           …                           …                                         …                                         …

01                         01                         0.22                                    0.4                                       3402

 

01                         02                         -0.12                                   0.4                                       2

01                         02                         -0.30                                   0.10                                    4

…                           …                           …                                         …                                         …

…                           …                           …                                         …                                         …

01                         02                         0.42                                    0.1                                       4000

 

It's a single file that looks like this for the 30 participants
Id is the participant number
task either 1 or 2
Varperform1 corresponds to measure 1
Varperform2 corresponds to measure 2
Time is the time of the measurement, the time starts at 0 at the beginning of each task.
I have performance curves per task that give trends but it is difficult to group them together. What I would like to do is to be able to clustering them by similarity according to the performance (varperform1) on all the activity and can be taking into account of (varperform1 and varpeform2) that without taking into account the task. But I would like in the clustering table the detail to differentiate them.
I would also like to be able to force the number of groupings to 5.
Please tell me that what I want to do is possible. After some research I saw that there was the k-mean with the Iris Example but the problem is that I did not find it with the possibility to treat it in time series or otherwise there has the proc cluster or proc likeness.

I am extremely new and really lost. And I would like some help on which code I will have to use to get my results

 

Thank u for you help

2 REPLIES 2
mkeintz
PROC Star

Are you saying that you want to cluster 30 subjects into 5 groups?  That's an average of 6 subjects/group, which ordinarily would seem a rather small average size to have any reasonably discriminant group assignments.

 

Regarding the time series construction.  Do you have reason to believe that measurements at one time point might inform your estimates of measurements at later time points?  Does order matter?

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
HDavid
Calcite | Level 5

yes, in fact,

I have 36 subjects and I want to group them into 3, 4 or 5 groups, because I want to compare groups with other variables.
For example, some topics are mostly negative, others negative and positive in a homogeneous or generally negative way at the beginning and more positive after half of the task.
I found an example of k-mean iris but everything I read does not talk about time series.
I just want a code to group my subjects and no prediction.
Am I answering your question?
Thank you for your help

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 504 views
  • 0 likes
  • 2 in conversation