Programming the statistical procedures from SAS

How do I calculate cannonical variables in Proc Discrim from the raw data?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 19
Accepted Solution

How do I calculate cannonical variables in Proc Discrim from the raw data?

I want to calculate the cannonical varibles from the raw data. How do I do this? I get Proc Discrim to print the value, but I cannot take the raw data and get the same value that SAS gives me.

 

Thank you for the help.


Accepted Solutions
Solution
‎01-31-2017 02:40 PM
SAS Super FREQ
Posts: 3,310

Re: How do I calculate cannonical variables in Proc Discrim from the raw data?

When you score a new observation, it does not redo the analysis as if the new observation had been included. It merely evaluates the model on the new observation.  

 

To score a new observation: (centered by the mean of the original data) by the If that is the process you want to carry out, then the raw canonical coefficients are one way to score the new data. The documentation link I sent shows two other equivalent methods that are equivalent.

1. center the observation by subtracting the mean of the original data (coordinate by coordinate)

2. Perform matrix multiplication of the centered observation times the raw canonical coefficients.

 

The documentation link I sent show the formula for this computation, as well as two other equivalent computations.

 

 

View solution in original post


All Replies
Respected Advisor
Posts: 4,606

Re: How do I calculate cannonical variables in Proc Discrim from the raw data?

The canonical variables coefficients are output to a dataset with option outstat= in proc candisc. Look at observations with _TYPE_ = "SCORE". Note that the raw data must be standardized before the coefficients are applied, as explained in the Output Data Sets documentation of proc candisc.

PG
Occasional Contributor
Posts: 19

Re: How do I calculate cannonical variables in Proc Discrim from the raw data?

I use the outstat= option and I get a new data set. I can calculate the means for each treatment and the standard deviation. The answers I get calculating by hand match what I get from SAS. However, PSTD doesn't match. SAS gives 8.1381977. If I calculate the standard deviation for all the data I get 9.8188. If I calculate the standard deviation for each treatment and average I get 8.130212. I tried normalizing (x-mean)/sd but that did not come close. I am doing something stupid wrong, but I just don't see it.
SAS Super FREQ
Posts: 3,310

Re: How do I calculate cannonical variables in Proc Discrim from the raw data?

Make sure you are using the same denominator as SAS when you do the hand computations. There is the n vs (n-1) issue, and if you are using a FREQ variable or WEIGHT variable the denominator changes.

 

Why not use a standard SAS data set such as Sashelp.class? Then you can share your code and calculations, and we can follow along and correct any errors in the calculation.

Occasional Contributor
Posts: 19

Re: How do I calculate cannonical variables in Proc Discrim from the raw data?

I didn't know about this. Great idea. Here is the program.

Proc discrim data=sashelp.iris crosslisterr distance ANOVA canonical outstat=good;

class species;

var SepalLength SepalWidth PetalLength PetalWidth;

run;

 

I look in the dataset "good" and I find "pstd" which I am assuming is the pooled standard deviation. The value in SAS is 5.14789 for SepalLength. How is this calculated?

Try 1) The overall mean for SepalLength is 58.43333 (row 5). The standard deviation for this mean is 8.280661 (STD in row 66). This isn't correct.

Try 2) The std for Setosa is 3.5248968 for Versicolor is 5.16171147 and Virginica is 6.3587959 (rows 61-63). The mean of these three values is 5.015135 (calculated in Excel). This too is not correct.

Try 3) ?

SAS Super FREQ
Posts: 3,310

Re: How do I calculate cannonical variables in Proc Discrim from the raw data?

Add the PCOV option to the PROC DSCRIM statement. Then the procedure will create the "Pooled Within-Class Covariance Matrix" (and also add that matrix to the OUTSTAT= data set).

 

The numbers in the PSTD row are the square-root of the diagonal elements of the pooled within-class covariance matrix. The formula for the covariance matrix is available in the SAS documentation for the CANDISC procedure.

Occasional Contributor
Posts: 19

Re: How do I calculate cannonical variables in Proc Discrim from the raw data?

Ok, I have gotten side tracked.

Here is the revised program.

Proc discrim data=sashelp.iris pcov crosslisterr distance ANOVA canonical outstat=good out=goo;

class species;

var SepalLength SepalWidth PetalLength PetalWidth;

run;

 

In dataset "goo" I find two new variables can1 and can2. Is there part of the output (or some missing output) that I could use to calculate can1 based only on the measures provided in the iris dataset (SepalLength SepalWidth PetalLength PetalWidth)?

SAS Super FREQ
Posts: 3,310

Re: How do I calculate cannonical variables in Proc Discrim from the raw data?

Yes, and the same documentation link that I sent tells you how to get it.  You use the RAWSCORES in the GOOD data set.

To get the canonical scores, use matrix multiplication with the centered data measurements and the raw scores.  If you have PROC IML, it looks like this:

proc print data=good;
where _TYPE_="RAWSCORE";
run;

proc iml;
use Goo;
read all var {SepalLength SepalWidth PetalLength PetalWidth} into X; /* data matrix */
read all var {Can1 Can2} into Can; /* canonical scores */
close;

use Good where( _TYPE_="RAWSCORE" );
read all var {SepalLength SepalWidth PetalLength PetalWidth} into R; /* scoring coeficients */
close;

Score = (X-mean(X))*R`;  /* should be same as [Can1 Can2] */

/* check that the Score equals the values in [Can1 Can2] */
maxDiff = max(abs(Score-Can));
print maxDiff;      /* prints 1.776E-15, which shows that the values are equal */

 

You can also use PROC SCORE to confirm the computations.

 

 

Occasional Contributor
Posts: 19

Re: How do I calculate cannonical variables in Proc Discrim from the raw data?

I guess I asked the question in the wrong way.

 

So lets say that I went out into the woods of Virginia and found a new iris plant. I took four measurements that were SepalLength, SepalWidth, PetalLength, and PetalWidth. I read a manuscript on how to classify my new iris, and it gave a table of (???) from which I was able to calculate what can1 and can2 would have been had my new observation been included (and assuming that the new observation fit into the existing data). Since the raw data were not published in the manuscript I don't have an option of redoing the analysis. I can now plot my new value onto the published graph and decide what plant I have found.

 

I had thought that the raw canonical coefficients were what I needed for "???" but that didn't seem to work. When I used them I did not get the value that SAS gave for Can1. 

Solution
‎01-31-2017 02:40 PM
SAS Super FREQ
Posts: 3,310

Re: How do I calculate cannonical variables in Proc Discrim from the raw data?

When you score a new observation, it does not redo the analysis as if the new observation had been included. It merely evaluates the model on the new observation.  

 

To score a new observation: (centered by the mean of the original data) by the If that is the process you want to carry out, then the raw canonical coefficients are one way to score the new data. The documentation link I sent shows two other equivalent methods that are equivalent.

1. center the observation by subtracting the mean of the original data (coordinate by coordinate)

2. Perform matrix multiplication of the centered observation times the raw canonical coefficients.

 

The documentation link I sent show the formula for this computation, as well as two other equivalent computations.

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 141 views
  • 7 likes
  • 3 in conversation