BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
dustychair
Pyrite | Level 9

Hi,

I have a data such as below.I tried to create a matrix as attached using this data but I failed. I tried merge or other similar statements but I think I need something more than these statements. If anyone can help me, I'll appreciate. Thanks.

Cand1    Cand2      Prob

87             627      .204E-05 
2291        5787     .281E-05 
2397        5823     .133E-05 
5011        5014     .395E-05 
5096        5097     .389E-05 
5122        5125     .336E-05 
5152        5153     .300E-05 
5152        5154     .377E-05 
5152        5158     .241E-05 
5175        5176     .151E-05 
5214        5216     .257E-05
5353        5358     .415E-05 
5372        5380     .731E-06
5402        5613     .188E-05
5505        5509     .356E-05
5522        5523     .945E-06 

5522        5530     .178E-05 
5533        5534     .122E-06
5667        5670     .950E-06
5714        5716     .248E-05 

5775        5778     .147E-05 
5803        5810     .110E-05 
5871        5874     .987E-06 
5874        5879     .471E-06 
5932        5936     .406E-06 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
dustychair
Pyrite | Level 9

Hi,

Thanks for your response. Let me explain the logic/story of my data. Cand1 and cand2 are candidates and the last column is probability.  for example the probability between candidate 87 and candidate 627 is .204E-05. In the raw data, candidate 5152 occurs three times and candidate 5522 occurs twice. Each candidates occurs once in the rows or columns and the probabilities related to these candidates are in the matrix in the excel file. That's why the size of raw data and matrix are different. I have updated the excel file, it is attached. 

Many thanks,

View solution in original post

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20

Can you be more specific about the logic? In your Excel Workbook, you have a 46x46 matrix that you want to create. The Row/Column names seem to resemble the Cand1 and Cand2 variables combined, which you can create in a data step like this

 

data have;
input Cand1 Cand2 Prob;
datalines;
87 627 .204E-05 
2291 5787 .281E-05 
2397 5823 .133E-05 
5011 5014 .395E-05 
5096 5097 .389E-05 
5122 5125 .336E-05 
5152 5153 .300E-05 
5152 5154 .377E-05 
5152 5158 .241E-05 
5175 5176 .151E-05 
5214 5216 .257E-05
5353 5358 .415E-05 
5372 5380 .731E-06
5402 5613 .188E-05
5505 5509 .356E-05
5522 5523 .945E-06 
5522 5530 .178E-05 
5533 5534 .122E-06
5667 5670 .950E-06
5714 5716 .248E-05 
5775 5778 .147E-05 
5803 5810 .110E-05 
5871 5874 .987E-06 
5874 5879 .471E-06 
5932 5936 .406E-06
;

data help;
   set have;
   Cand=Cand1;output;
   Cand=Cand2;output;
run;

However, if we use this data, we will get a 50x50 matrix, since you have 25 rows in your original data set. Are we to exclude certain observations?

dustychair
Pyrite | Level 9

Hi,

Thanks for your response. Let me explain the logic/story of my data. Cand1 and cand2 are candidates and the last column is probability.  for example the probability between candidate 87 and candidate 627 is .204E-05. In the raw data, candidate 5152 occurs three times and candidate 5522 occurs twice. Each candidates occurs once in the rows or columns and the probabilities related to these candidates are in the matrix in the excel file. That's why the size of raw data and matrix are different. I have updated the excel file, it is attached. 

Many thanks,

PeterClemmensen
Tourmaline | Level 20

Ok. I think something like this will meet your needs.

 

data have;
input Cand1 Cand2 Prob :e7.;
datalines;
87 627 .204E-05 
2291 5787 .281E-05 
2397 5823 .133E-05 
5011 5014 .395E-05 
5096 5097 .389E-05 
5122 5125 .336E-05 
5152 5153 .300E-05 
5152 5154 .377E-05 
5152 5158 .241E-05 
5175 5176 .151E-05 
5214 5216 .257E-05
5353 5358 .415E-05 
5372 5380 .731E-06
5402 5613 .188E-05
5505 5509 .356E-05
5522 5523 .945E-06 
5522 5530 .178E-05 
5533 5534 .122E-06
5667 5670 .950E-06
5714 5716 .248E-05 
5775 5778 .147E-05 
5803 5810 .110E-05 
5871 5874 .987E-06 
5874 5879 .471E-06 
5932 5936 .406E-06
;

proc iml;

use have;
   read all var _ALL_; 
close have;

Cand=unique(Cand1//Cand2);
call sort(Cand);

X=j(ncol(Cand), ncol(Cand), .);

do i=1 to nrow(Prob);
   X[loc(Cand=Cand1[i]),loc(Cand=Cand2[i])]=Prob[i];
end;

print X[colName=(char(Cand)) rowName=(char(Cand))];

quit;
dustychair
Pyrite | Level 9

Hi Draycut,

I tried the code you wrote. Here is the the error I got. Many thanks!

 

 

127 proc iml;
NOTE: IML Ready
128
129 use tr;
130 read all var _ALL_;
131 close tr;
132
133 Cand=unique(Cand_1//Cand_2);
134 call sort(Cand);
135
136 X=j(ncol(Cand), ncol(Cand), .);
137
138 do i=1 to nrow(Prob);
139 X[loc(Cand=Cand_1[i]),loc(Cand=Cand_2[i])]=RT_PROB[i];
140 end;
141
142 print X[colName=(char(Cand)) rowName=(char(Cand))];
ERROR: (execution) Character argument should be numeric.

operation : CHAR at line 142 column 43
operands : Cand
Cand 1 row 46 cols (character, size 😎

statement : PRINT at line 142 column 1
143
144 quit;
NOTE: Exiting IML.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IML used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds

Rick_SAS
SAS Super FREQ

draycut and ksharp are both using numeric variables for the Cand_1 and Cand_2 variables. You are using character variables. Either change to numeric or get rid of the calls (such as CHAR) that assume that Cand_* are numeric.

dustychair
Pyrite | Level 9

Hi Dr. Wicklin,

Your correction helped for both codes.  

 

Thanks

Ksharp
Super User

Do you need IML code ? or sql ,data step could be accepted?

 

data have;
input Cand1 Cand2 Prob :e7.;
datalines;
87 627 .204E-05 
2291 5787 .281E-05 
2397 5823 .133E-05 
5011 5014 .395E-05 
5096 5097 .389E-05 
5122 5125 .336E-05 
5152 5153 .300E-05 
5152 5154 .377E-05 
5152 5158 .241E-05 
5175 5176 .151E-05 
5214 5216 .257E-05
5353 5358 .415E-05 
5372 5380 .731E-06
5402 5613 .188E-05
5505 5509 .356E-05
5522 5523 .945E-06 
5522 5530 .178E-05 
5533 5534 .122E-06
5667 5670 .950E-06
5714 5716 .248E-05 
5775 5778 .147E-05 
5803 5810 .110E-05 
5871 5874 .987E-06 
5874 5879 .471E-06 
5932 5936 .406E-06
;
proc iml;
use have nobs nobs;
read all var{Cand1 Cand2 Prob};
close;

levels=unique(Cand1||Cand2);
want=j(ncol(levels),ncol(levels),.);
mattrib want r=(char(levels)) c=(char(levels)) label='' ;
row=char(Cand1);
col=char(Cand2);
do i=1 to nobs;
r=row[i];c=col[i];p=Prob[i];
want[r,c]=p;
end;
print want;
quit;
dustychair
Pyrite | Level 9

Hello Ksharp,

Thanks for your response. The code did not work. It creates a matrix but the cells are empty and the columns and rows don't have candidates id numbers. I attached the output and the code I had.

Thanks

Ksharp
Super User

Do you need IML code ? or sql ,data step could be accepted?

 

data have;
input Cand1 Cand2 Prob :e7.;
datalines;
87 627 .204E-05 
2291 5787 .281E-05 
2397 5823 .133E-05 
5011 5014 .395E-05 
5096 5097 .389E-05 
5122 5125 .336E-05 
5152 5153 .300E-05 
5152 5154 .377E-05 
5152 5158 .241E-05 
5175 5176 .151E-05 
5214 5216 .257E-05
5353 5358 .415E-05 
5372 5380 .731E-06
5402 5613 .188E-05
5505 5509 .356E-05
5522 5523 .945E-06 
5522 5530 .178E-05 
5533 5534 .122E-06
5667 5670 .950E-06
5714 5716 .248E-05 
5775 5778 .147E-05 
5803 5810 .110E-05 
5871 5874 .987E-06 
5874 5879 .471E-06 
5932 5936 .406E-06
;
proc iml;
use have nobs nobs;
read all var{Cand1 Cand2 Prob};
close;

levels=unique(Cand1||Cand2);
want=j(ncol(levels),ncol(levels),.);
mattrib want r=(char(levels)) c=(char(levels)) label='' ;
row=char(Cand1);
col=char(Cand2);
do i=1 to nobs;
r=row[i];c=col[i];p=Prob[i];
want[r,c]=p;
end;
print want;
quit;
Ksharp
Super User
data have;
input Cand1 Cand2 Prob :e7.;
datalines;
87 627 .204E-05 
2291 5787 .281E-05 
2397 5823 .133E-05 
5011 5014 .395E-05 
5096 5097 .389E-05 
5122 5125 .336E-05 
5152 5153 .300E-05 
5152 5154 .377E-05 
5152 5158 .241E-05 
5175 5176 .151E-05 
5214 5216 .257E-05
5353 5358 .415E-05 
5372 5380 .731E-06
5402 5613 .188E-05
5505 5509 .356E-05
5522 5523 .945E-06 
5522 5530 .178E-05 
5533 5534 .122E-06
5667 5670 .950E-06
5714 5716 .248E-05 
5775 5778 .147E-05 
5803 5810 .110E-05 
5871 5874 .987E-06 
5874 5879 .471E-06 
5932 5936 .406E-06
;
proc iml;
use have nobs nobs;
read all var{Cand1 Cand2 Prob};
close;

levels=unique(Cand1||Cand2);
want=j(ncol(levels),ncol(levels),.);
mattrib want r=(char(levels)) c=(char(levels)) label='' ;
row=char(Cand1);
col=char(Cand2);
do i=1 to nobs;
r=row[i];c=col[i];p=Prob[i];
want[r,c]=p;
end;
print want;
quit;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 10 replies
  • 3749 views
  • 0 likes
  • 4 in conversation