BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dustychair
Pyrite | Level 9

Hi,

I have two questions:

1. I have a matrix which is created from a raw data. There are two variables in that data: One is candidates and the other is probabilities. Candidates are in rows and columns (columns and rows have the same candidates). The probabilities are in the intersection of candidates. For example, probability for candidate 87 and 627 is 0.000034. In the matrix I have only these probabilities. I changed the missing values to 0. But, in the matrix probability for candidate 627 and candidate 87 is also 0.000034. When I use 

data matrix;
set matrix;
array change _numeric_;
do over change;
if change=. then change=0;
end;
run ;

that code it changes all of missing values. I want to have the probabilities for only one of the pairs, for the other same pair I want to it  missing and for rest of the cells I want to have 0. How can I do that?

 

2. I'd like to define probabilities as a variable. I'm going to use that matrix for cluster analysis. When I use code for cluster analysis, it says 'ERROR: Variable RT_PROB not found.'. So I need to define that variable too. How can I do that?

 

3. The code and raw data are attached.

 

4.Thanks 

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Regarding 1, you could do it all in IML:


proc iml;
use tr;   read all var _ALL_; close tr;

Cand=unique(Cand_1//Cand_2);
call sort(Cand);

X=j(ncol(Cand), ncol(Cand), .);
do i=1 to nrow(RT_PROB);
   X[loc(Cand=Cand_1[i]),loc(Cand=Cand_2[i])]=RT_PROB[i];
end;

/* replace missing values with 0 */
missIdx = loc(X=.);     /* locations of missing values */
nonmissIdx = loc(X^=.); /* locations of nonmissing values */
X[missIdx] = 0;         /* replace missing with 0 */
Z = X;                  /* make a copy */
Z[nonmissIdx] = .;      /* replace orig probs with missing */
X = X + Z`;             /* use missing values for lower triangular probabilities */
print X[colName=((Cand)) rowName=((Cand))];

create matrix from X [colname=Cand rowname=Cand];
append from X[rowname=Cand];
close;
quit;

View solution in original post

6 REPLIES 6
Rick_SAS
SAS Super FREQ

Regarding 1, you could do it all in IML:


proc iml;
use tr;   read all var _ALL_; close tr;

Cand=unique(Cand_1//Cand_2);
call sort(Cand);

X=j(ncol(Cand), ncol(Cand), .);
do i=1 to nrow(RT_PROB);
   X[loc(Cand=Cand_1[i]),loc(Cand=Cand_2[i])]=RT_PROB[i];
end;

/* replace missing values with 0 */
missIdx = loc(X=.);     /* locations of missing values */
nonmissIdx = loc(X^=.); /* locations of nonmissing values */
X[missIdx] = 0;         /* replace missing with 0 */
Z = X;                  /* make a copy */
Z[nonmissIdx] = .;      /* replace orig probs with missing */
X = X + Z`;             /* use missing values for lower triangular probabilities */
print X[colName=((Cand)) rowName=((Cand))];

create matrix from X [colname=Cand rowname=Cand];
append from X[rowname=Cand];
close;
quit;
dustychair
Pyrite | Level 9

Hi Dr. Wicklin,

Thanks for the code. The code worked. Do you think using the whole matrix instead of triangular matrix change the results? If so, the upper triangular matrix should be empty or missing? 

Thanks

Rick_SAS
SAS Super FREQ

I do not know. I can't figure out how you are using the probability matrix to cluster these firms. My best guess is that you are trying to use the matrix of probabilities as if it were a distance matrix, but I don't see how that will work with such a sparse matrix. 

 

If it helps, there is a Getting Started example in the PROC CLUSTER documentation that uses a lower triangular distance matrix. 

dustychair
Pyrite | Level 9

Yes, your guess is correct. I want to use that matrix as a distance matrix. I read the documentation you mentioned, and actually still reading. My all purpose is to prepare the matrix for the cluster analysis. I don't have a regular data like in the examples for cluster analysis. If you help me to get triangular matrix and define the RT_PROB as a variable, I'll appreciate you. Probably that will be my last step to achieve my goal.

Many thanks!

Rick_SAS
SAS Super FREQ

    > Probably that will be my last step to achieve my goal.

 

I doubt that. Your matrix is not a distance matrix. The first row of your matrix says that the "distance" to Firm=2291 is ZERO for all firms except 5787.  That means that you have at most two clusters: the one including 5787 and the one including the rest.

But the second row says that the "distance" to 2397 is zero for all firms except 5823. Those two statements are incompatible and inconsistent.  Mathematically, a distance function has the property that if dist(x,y)=0, then x=y.

 

I strongly suggest you discuss your project with a supervisor, mentor, or advisor to better define your goals and objectives and how you might be able to achieve them.  I also strongly suggest that you develop your strategies on a small problem (5-10 "firms"), and do not proceed to the full data until you can solve the smaller problem.

 

Good luck.

dustychair
Pyrite | Level 9

Hi Dr. Wicklin,

Thanks for the information you gave. These probabilities are from detection of cheating students. Probabilities are considered distance. Using cluster analysis is a new method to detect cheating. There are still some work I have to do. My problem was preparing data for my purpose. Thanks for your help again.

Best,

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 6 replies
  • 1698 views
  • 2 likes
  • 2 in conversation