Hi all,
I have a dataset looks like this:
ID | Rank1 | Rank2 | Rank3 | Rank4 | Rank5 | Rank 6 |
1 | 3 | 4 | 4 | 3 | 1 | 5 |
2 | 5 | 4 | 3 | 6 | 1 | 2 |
3 | 1 | 6 | 6 | 3 | 6 | 6 |
4 | 5 | 6 | 4 | 1 | 2 | 6 |
5 | 3 | 1 | 3 | 2 | 1 | 6 |
6 | 2 | 3 | 6 | 3 | 4 | 1 |
7 | 6 | 2 | 2 | 2 | 2 | 6 |
8 | 2 | 2 | 1 | 4 | 5 | 5 |
9 | 1 | 2 | 5 | 1 | 1 | 5 |
10 | 1 | 4 | 4 | 1 | 6 | 6 |
The Rank1- Rank6 may contain values from 1-6.
So within the same ID, what I want is:
1. Create a flag variable SAME, if there is same value from Rank1-Rank6 for each ID, then SAME=1 otherwise SAME = 0. for example , ID2 doesn't have the same value from Rank1-Rank6, then SAME =0.
2.
I want to know how many Ranks are the same as Rank1, Rank2... etc. For example, across total ID, there is 1 Rank2 is the same as Rank1( ID=8), 1 Rank3 is the same as Rank1(ID=5), 3 Rank4 are the same as Rank1 (ID =1,9,10). So the result data will be like this:
Rank1 | Rank2 | Rank3 | Rank4 | Rank5 | Rank6 | |
Rank1 | N/A | 1 | 1 | 3 | 1 | 1 |
Rank2 | N/A | X | X | X | X | |
Rank3 | N/A | X | X | X | ||
Rank4 | N/A | X | X | |||
Rank5 | N/A | X | ||||
Rank6 | N/A |
Thanks for any help!!!
Hi @Linmuxi,
Creating the flag variable SAME is easy. Just to clarify: SAME=1 if there's at least one pair (i,j) with RANKi=RANKj?
@Linmuxi wrote:
2. try to find matched variables. I want to know which variable has the same values as Rank1, Rank2, Rank3,...Rank6. For example, in ID=1, Rank1 and Rank4 has the same value, Rank2 and Rank3 have the same value. etc. But I haven't think out how to create variables to get the info I want.
You can safely leave the question of how to create the variables to the experts in this forum, but you should decide first what kind of variables you need for your next steps: perhaps 15 flags (like variable SAME) for rank1=rank2, rank1=rank3, ..., rank5=rank6?
Hi @FreelanceReinh,
Thanks so much for your reply! Yes at least one pair then SAME =1.
For my second question, in general, I want to know how many Ranks are the same as Rank1, Rank2... etc. For example, across total ID, there is 1 Rank2 is the same as Rank1( ID=8), 1 Rank3 is the same as Rank1(ID=5), 3 Rank4 are the same as Rank1 (ID =1,9,10). So the result data will be like this:
Rank1 | Rank2 | Rank3 | Rank4 | Rank5 | Rank6 | |
Rank1 | N/A | 1 | 1 | 3 | 1 | 1 |
Rank2 | N/A | X | X | X | X | |
Rank3 | N/A | X | X | X | ||
Rank4 | N/A | X | X | |||
Rank5 | N/A | X | ||||
Rank6 | N/A |
I am not sure if it's workable...kind of too complex to me to solve...
This should create the triangular matrix (omitting the less interesting first column and last row):
data want2(drop=i j id rank:);
do until(last);
set have end=last;
array r[6] rank:;
array e[5, 2:6];
do i=1 to dim(r)-1;
do j=i+1 to dim(r);
e[i,j]+(r[i]=r[j]);
end;
end;
end;
file print;
do j=2 to dim(r);
put @(j*6-4) 'Rank' j @;
end;
put;
do i=1 to dim(r)-1;
put 'Rank' i @;
do j=2 to dim(r);
put +1 e[i,j] 5. @;
end;
put;
end;
run;
Result (using listing output):
Rank2 Rank3 Rank4 Rank5 Rank6 Rank1 1 1 3 1 1 Rank2 . 4 2 3 2 Rank3 . . 1 2 2 Rank4 . . . 2 0 Rank5 . . . . 3
Dataset WANT2 contains the 25 elements of the 5x5 matrix in variables e1, e2, ..., e25 (row-major order).
(I'll be back later "today" -- it's after midnight in my time zone.)
Edit: Inserted file print; so as to write the result table to the output window rather than to the log.
Here is a variant of my previous solution. It creates two output datasets:
Variables Rank1 - Rank6 in dataset HAVE are now allowed to contain missing values, but only equalities of non-missing Ranki values are considered in the calculation of variable SAME in dataset WANT1 and in the counts in dataset WANT2.
data want1(drop=var) want2(drop=id rank1 same);
length var $5;
do until(last);
set have end=last;
array r[6] rank:;
array e[5, 2:6] _temporary_;
same=0;
do i=1 to dim(r)-1;
do j=i+1 to dim(r);
_s=(r[i]=r[j] & r[j]>.z);
if _s then same=1;
e[i,j]+_s;
end;
end;
output want1;
end;
do i=1 to dim(r)-1;
var=cats('Rank',i);
do j=2 to dim(r);
r[j]=e[i,j];
end;
output want2;
end;
drop i j _s;
label var='A0'x;
run;
proc print data=want1;
id id;
run;
proc print data=want2 label;
id var;
run;
PROC PRINT output:
WANT1
ID Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 same 1 3 4 4 3 1 5 1 2 5 4 3 6 1 2 0 3 1 6 6 3 6 6 1 4 5 6 4 1 2 6 1 5 3 1 3 2 1 6 1 6 2 3 6 3 4 1 1 7 6 2 2 2 2 6 1 8 2 2 1 4 5 5 1 9 1 2 5 1 1 5 1 10 1 4 4 1 6 6 1
WANT2
Rank2 Rank3 Rank4 Rank5 Rank6 Rank1 1 1 3 1 1 Rank2 . 4 2 3 2 Rank3 . . 1 2 2 Rank4 . . . 2 0 Rank5 . . . . 3
You want to determine if there are any duplicate values in the variables RANK1 .... RANK6.
If you know that all the rank variables are integers from 1 through 6, then all you need to do is see if the mean and std of the RANK variables=mean(1,2,3,4,5,6) and std(1,2,3,4,5,6). There's no need to do a set of pairwise comparisons.
data have;
input ID Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 ;
datalines;
1 3 4 4 3 1 5
2 5 4 3 6 1 2
3 1 6 6 3 6 6
4 5 6 4 1 2 6
5 3 1 3 2 1 6
6 2 3 6 3 4 1
7 6 2 2 2 2 6
8 2 2 1 4 5 5
9 1 2 5 1 1 5
10 1 4 4 1 6 6
run;
data want (drop=_:);
if _n_=1 then do;
_mn=mean(1,2,3,4,5,6);
_sd=std(1,2,3,4,5,6);
end;
retain _mn _sd;
set have;
if mean(of rank:)=_mn and std(of rank:)=_sd then same=0;
else same=1;
run;
Or, a bit shorter (but essentially the same idea):
data want;
set have;
same=~(round(cv(of rank:),.01)=53.45);
run;
If you have SAS/IML ,that would be easy.
data have;
infile cards expandtabs;
input ID Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 ;
datalines;
1 3 4 4 3 1 5
2 5 4 3 6 1 2
3 1 6 6 3 6 6
4 5 6 4 1 2 6
5 3 1 3 2 1 6
6 2 3 6 3 4 1
7 6 2 2 2 2 6
8 2 2 1 4 5 5
9 1 2 5 1 1 5
10 1 4 4 1 6 6
;
run;
proc iml;
use have(keep=rank:) nobs nobs;
read all var _all_ into rank[c=vname];
close;
r=j(ncol(rank),ncol(rank),.);
do i=1 to ncol(rank)-1;
do j=i+1 to ncol(rank);
r[i,j]=(rank[,i]=rank[,j])[+];
end;
end;
create want from r[r=vname c=vname];
append from r[r=vname];
close;
quit;
If you have SAS/IML ,that would be easy.
data have;
infile cards expandtabs;
input ID Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 ;
datalines;
1 3 4 4 3 1 5
2 5 4 3 6 1 2
3 1 6 6 3 6 6
4 5 6 4 1 2 6
5 3 1 3 2 1 6
6 2 3 6 3 4 1
7 6 2 2 2 2 6
8 2 2 1 4 5 5
9 1 2 5 1 1 5
10 1 4 4 1 6 6
;
run;
proc iml;
use have(keep=rank:) nobs nobs;
read all var _all_ into rank[c=vname];
close;
r=j(ncol(rank),ncol(rank),.);
do i=1 to ncol(rank)-1;
do j=i+1 to ncol(rank);
r[i,j]=(rank[,i]=rank[,j])[+];
end;
end;
create want from r[r=vname c=vname];
append from r[r=vname];
close;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.