I have three variables that contain genotyping data for patients. Each data line is a patient. Spoligotype is a character variable which has 15 digits. Each sequence corresponds to a unique genotype and several patients can share the same genotype/sequence (meaning patients in one cluster will have the same genotype). This is how the proc freq output looks like. The dataset has ~700 observation. Variable 2 is called MIRU. This contains a string of characters, alphabets and digits. Each unique string corresponds to a unique genotype which can be shared by a number of patients (meaning patients in one cluster will have the same genotype). The proc freq looks like this: I want to group patients by the i) same MIRU ii) same spoligotype and iii) same MIRU and same spoligotype patterns. Since there are a large number of combinations and patterns, I am unable to use 'if/then' statements. How can I accomplish it? If say patient 1, 11 & 200 have the same spoligotype value, I want to assign the value A to all three patients in a new variable. Each value in this new variable should represent a set of patients with same sequence of digits for spoligotype. This is what my data looks like. Each line is a patient.
... View more