BookmarkSubscribeRSS Feed
Linmuxi
Calcite | Level 5

Hi all,

 

I have a dataset looks like this:

 

IDRank1Rank2Rank3Rank4Rank5Rank 6
1344315
2543612
3166366
4564126
5313216
6236341
7622226
8221455
9125115
10144166

 

The Rank1- Rank6 may contain values from 1-6.

 

So within the same ID, what I want is:

 

1. Create a flag variable SAME, if there is same value from Rank1-Rank6 for each ID, then SAME=1 otherwise SAME = 0. for example , ID2 doesn't have the same value from Rank1-Rank6, then SAME =0.

2.

 

I want to know how many Ranks are the same as Rank1, Rank2... etc. For example, across total ID, there is 1 Rank2 is the same as Rank1( ID=8), 1 Rank3 is the same as Rank1(ID=5), 3 Rank4 are the same as Rank1 (ID =1,9,10). So the result data will be like this:

 

 Rank1Rank2Rank3Rank4Rank5Rank6
Rank1N/A11311
Rank2 N/A X X X X
Rank3  N/A X X X
Rank4   N/A X X
Rank5    N/A X
Rank6     N/A

 

Thanks for any help!!!

8 REPLIES 8
FreelanceReinh
Jade | Level 19

Hi @Linmuxi,

 

Creating the flag variable SAME is easy. Just to clarify: SAME=1 if there's at least one pair (i,j) with RANKi=RANKj?


@Linmuxi wrote:

 

2. try to find matched variables. I want to know which variable has the same values as Rank1, Rank2, Rank3,...Rank6. For example, in ID=1, Rank1 and Rank4 has the same value, Rank2 and Rank3 have the same value. etc. But I haven't think out how to create variables to get the info I want.


You can safely leave the question of how to create the variables to the experts in this forum, but you should decide first what kind of variables you need for your next steps: perhaps 15 flags (like variable SAME) for rank1=rank2, rank1=rank3, ..., rank5=rank6?

Linmuxi
Calcite | Level 5

Hi @FreelanceReinh,

 

Thanks so much for your reply! Yes at least one pair then SAME =1.

 

For my second question, in general, I want to know how many Ranks are the same as Rank1, Rank2... etc. For example, across total ID, there is 1 Rank2 is the same as Rank1( ID=8), 1 Rank3 is the same as Rank1(ID=5), 3 Rank4 are the same as Rank1 (ID =1,9,10). So the result data will be like this:

 

 Rank1Rank2Rank3Rank4Rank5Rank6
Rank1N/A11311
Rank2 N/A X X X X
Rank3  N/A X X X
Rank4   N/A X X
Rank5    N/A X
Rank6     N/A

 

I am not sure if it's workable...kind of too complex to me to solve...

FreelanceReinh
Jade | Level 19

This should create the triangular matrix (omitting the less interesting first column and last row):

data want2(drop=i j id rank:);
do until(last);
  set have end=last;
  array r[6] rank:;
  array e[5, 2:6];
  do i=1 to dim(r)-1;
    do j=i+1 to dim(r);
      e[i,j]+(r[i]=r[j]);
    end;
  end;
end;
file print;
do j=2 to dim(r);
  put @(j*6-4) 'Rank' j @;
end;
put;
do i=1 to dim(r)-1;
  put 'Rank' i @;
  do j=2 to dim(r);
    put +1 e[i,j] 5. @;
  end;
  put;
end;
run;

Result (using listing output):

       Rank2 Rank3 Rank4 Rank5 Rank6
Rank1      1     1     3     1     1
Rank2      .     4     2     3     2
Rank3      .     .     1     2     2
Rank4      .     .     .     2     0
Rank5      .     .     .     .     3

Dataset WANT2 contains the 25 elements of the 5x5 matrix in variables e1, e2, ..., e25 (row-major order).

 

(I'll be back later "today" -- it's after midnight in my time zone.)

 

Edit: Inserted file print; so as to write the result table to the output window rather than to the log.

FreelanceReinh
Jade | Level 19

Here is a variant of my previous solution. It creates two output datasets:

  1. WANT1 is a copy of input dataset HAVE plus variable SAME (now computed without using statistics function tricks).
  2. WANT2 contains the desired triangular matrix, now in the form of a dataset whose variables correspond to the six columns of the matrix. This makes it a bit easier to retrieve a specific entry of the matrix. For example, the number of observations in dataset HAVE with (non-missing) Rank5=Rank2 can be found in variable Rank5 where var='Rank2'.

Variables Rank1 - Rank6 in dataset HAVE are now allowed to contain missing values, but only equalities of non-missing Ranki values are considered in the calculation of variable SAME in dataset WANT1 and in the counts in dataset WANT2.

 

data want1(drop=var) want2(drop=id rank1 same);
length var $5;
do until(last);
  set have end=last;
  array r[6] rank:;
  array e[5, 2:6] _temporary_;
  same=0;
  do i=1 to dim(r)-1;
    do j=i+1 to dim(r);
      _s=(r[i]=r[j] & r[j]>.z);
      if _s then same=1;
      e[i,j]+_s;
    end;
  end;
  output want1;
end;
do i=1 to dim(r)-1;
  var=cats('Rank',i);
  do j=2 to dim(r);
    r[j]=e[i,j];
  end;
  output want2;
end;
drop i j _s;
label var='A0'x;
run;

proc print data=want1;
id id;
run;

proc print data=want2 label;
id var;
run;

PROC PRINT output:

WANT1

ID    Rank1    Rank2    Rank3    Rank4    Rank5    Rank6    same

 1      3        4        4        3        1        5        1
 2      5        4        3        6        1        2        0
 3      1        6        6        3        6        6        1
 4      5        6        4        1        2        6        1
 5      3        1        3        2        1        6        1
 6      2        3        6        3        4        1        1
 7      6        2        2        2        2        6        1
 8      2        2        1        4        5        5        1
 9      1        2        5        1        1        5        1
10      1        4        4        1        6        6        1

WANT2

         Rank2    Rank3    Rank4    Rank5    Rank6

Rank1      1        1        3        1        1
Rank2      .        4        2        3        2
Rank3      .        .        1        2        2
Rank4      .        .        .        2        0
Rank5      .        .        .        .        3

 

mkeintz
PROC Star

You want to determine if there are any duplicate values in the variables RANK1 .... RANK6.

 

If you know that all the rank variables are integers from 1 through 6, then all you need to do is see if the  mean and std of the RANK variables=mean(1,2,3,4,5,6) and std(1,2,3,4,5,6).  There's no need to do a set of pairwise comparisons.

 

data have;
  input ID Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 ;
datalines;
1 3 4 4 3 1 5 
2 5 4 3 6 1 2 
3 1 6 6 3 6 6 
4 5 6 4 1 2 6 
5 3 1 3 2 1 6 
6 2 3 6 3 4 1 
7 6 2 2 2 2 6 
8 2 2 1 4 5 5 
9 1 2 5 1 1 5 
10 1 4 4 1 6 6 
run;

data want (drop=_:);
  if _n_=1 then do;
    _mn=mean(1,2,3,4,5,6);
    _sd=std(1,2,3,4,5,6);
  end;
  retain _mn _sd;
  set have;
  if mean(of rank:)=_mn and std(of rank:)=_sd then same=0;
  else same=1;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
FreelanceReinh
Jade | Level 19

Or, a bit shorter (but essentially the same idea):

data want;
set have;
same=~(round(cv(of rank:),.01)=53.45);
run;

 

Ksharp
Super User

If you have SAS/IML ,that would be easy.

 

data have;
infile cards expandtabs;
  input ID Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 ;
datalines;
1	3	4	4	3	1	5
2	5	4	3	6	1	2
3	1	6	6	3	6	6
4	5	6	4	1	2	6
5	3	1	3	2	1	6
6	2	3	6	3	4	1
7	6	2	2	2	2	6
8	2	2	1	4	5	5
9	1	2	5	1	1	5
10	1	4	4	1	6	6
;
run;

proc iml;
use have(keep=rank:) nobs nobs;
read all var _all_ into rank[c=vname];
close;
r=j(ncol(rank),ncol(rank),.);
do i=1 to ncol(rank)-1;
 do j=i+1 to ncol(rank);
  r[i,j]=(rank[,i]=rank[,j])[+];
end;
end;

create want from r[r=vname c=vname];
append from r[r=vname];
close;
quit;
Ksharp
Super User

If you have SAS/IML ,that would be easy.

 

data have;
infile cards expandtabs;
  input ID Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 ;
datalines;
1	3	4	4	3	1	5
2	5	4	3	6	1	2
3	1	6	6	3	6	6
4	5	6	4	1	2	6
5	3	1	3	2	1	6
6	2	3	6	3	4	1
7	6	2	2	2	2	6
8	2	2	1	4	5	5
9	1	2	5	1	1	5
10	1	4	4	1	6	6
;
run;

proc iml;
use have(keep=rank:) nobs nobs;
read all var _all_ into rank[c=vname];
close;
r=j(ncol(rank),ncol(rank),.);
do i=1 to ncol(rank)-1;
 do j=i+1 to ncol(rank);
  r[i,j]=(rank[,i]=rank[,j])[+];
end;
end;

create want from r[r=vname c=vname];
append from r[r=vname];
close;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 958 views
  • 0 likes
  • 4 in conversation