In regards to dimension reduction for the purpose of visualization, there isn't necessarily a correct or incorrect answer. You have identified two good techniques, but these techniques do something slightly differently. This will mean that your understanding of the plots that they produce need to be different.
Canonical Discriminant Analysis will use the cluster variable and create a projection that is based upon the cluster labels that you have assigned. That this means, is that CDA will try to find the linear combination of inputs that has the highest correlation with the cluster label. You can think of this as the "best" (given the metric used in CDA) projection of the data for the purpose of seeing what linear combination best separates the cluster labels.
Principal Component Analysis will not consider the cluster labels. This could be more useful if you want to see how the clustering looks in a lower dimension without using the cluster information to bias your projection. The projection of the data is not dependent on how you cluster, but is instead the "best" with respect to the variance of the data, so you can see the data, and then see how the cluster labels are distributed across your projected space.
Ultimately the dimension reduction methods answer slightly different questions, and what you're trying to with the dimension reduction and plotting should inform which route that you go.
I hope this helped!
... View more