BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dvd
Calcite | Level 5 dvd
Calcite | Level 5

Hi, 

 

when running proc discrim with unequal priors, say 0.9 and 0.1, a generalized squared distance matrix is produced in the output. Although  I (computationally) understand how those values are computed (as the SAS manual also shows), I was wondering how to INTERPRET a nonzero distance to itself, and how to INTERPRET the asymmetry in the distances.

Any idea?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

The GENERALIZED squared distance between groups is composed of the squared distance  plus two other terms. The squared distance  is symmetric and the distance from a group to itself is zero.  So it is the other two terms that provides the assymmetry.

 

The formula is in the documentation under "Parametric Mathods". It includes the terms

1.  ln| S_t |, which is the log of the determinant of the covariance matrix within the t_th group

2. -2 ln(q_t), where q_t is the prior probability of membership in the t_th group. (Note that q_t < 1, so this term is actually positive.) 

 

The second paragraph of the "Overview" section cites Rao (1973) for the generalized squared distance.

 

As for interpretation, I don't have a reference, but I'll take a guess. From the definition, it looks like the genearlized distancefrom Group t to itself increases when the variance within the group increases. It also increases as the prior probability decreases. So I'd interpret the terms as giving information about how much variance is in the group and how rare the group is.  Groups that are "more rare and have more variance" have a greater "distance to themselves" than groups that are less variable and have more members.  The only group that has zero for a generalized distance to itself is the limiting case of "zero variance" (all values equal) and "membership probability=1."

View solution in original post

5 REPLIES 5
Rick_SAS
SAS Super FREQ

Could you point us to an example in the doc that has this output?

dvd
Calcite | Level 5 dvd
Calcite | Level 5

Thank you for your kind reply.
I hereby attached an example of output. 
X1 is my response variable which has 3 classes (1,2,3).
When I run the proc discrim with unequal priors I get an output like this. Can you kindly help me interpreting it. The SAS manual provides explanations about the computation but nothing is mentioned about interpretation of the nonzero distance of a class to iteself  and about the asymmetry in the matrix. 

Thanks a lot,
D.


Output_proc_discrim.png
Rick_SAS
SAS Super FREQ

The GENERALIZED squared distance between groups is composed of the squared distance  plus two other terms. The squared distance  is symmetric and the distance from a group to itself is zero.  So it is the other two terms that provides the assymmetry.

 

The formula is in the documentation under "Parametric Mathods". It includes the terms

1.  ln| S_t |, which is the log of the determinant of the covariance matrix within the t_th group

2. -2 ln(q_t), where q_t is the prior probability of membership in the t_th group. (Note that q_t < 1, so this term is actually positive.) 

 

The second paragraph of the "Overview" section cites Rao (1973) for the generalized squared distance.

 

As for interpretation, I don't have a reference, but I'll take a guess. From the definition, it looks like the genearlized distancefrom Group t to itself increases when the variance within the group increases. It also increases as the prior probability decreases. So I'd interpret the terms as giving information about how much variance is in the group and how rare the group is.  Groups that are "more rare and have more variance" have a greater "distance to themselves" than groups that are less variable and have more members.  The only group that has zero for a generalized distance to itself is the limiting case of "zero variance" (all values equal) and "membership probability=1."

dvd
Calcite | Level 5 dvd
Calcite | Level 5

Thank you again for your kindness!

 

Hence, I guess that this output is merely "computational".


Your suggestion on how to interpret those values definetly makes sense and I agree. I would say, though, that such an information can be read (more clearly) from other parts of the SAS output, that's why I was wondering if this output was trying to tell me something more or something different. Unfortunately, it often happens (of course non only with SAS), that some not relevant output are provided.

I appreciate your kind support,

Thank you again,
Best.
D

Rick_SAS
SAS Super FREQ

Great. When you think no further comments are necessary, mark the post as "answered" so that others know that the discussion can be closed.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3241 views
  • 1 like
  • 2 in conversation