Re: how to count the same value in one row in SAS

Linmuxi · Posted 11-25-2018 04:43 PM

Hi all,

I have a dataset looks like this:

ID	Rank1	Rank2	Rank3	Rank4	Rank5	Rank 6
1	3	4	4	3	1	5
2	5	4	3	6	1	2
3	1	6	6	3	6	6
4	5	6	4	1	2	6
5	3	1	3	2	1	6
6	2	3	6	3	4	1
7	6	2	2	2	2	6
8	2	2	1	4	5	5
9	1	2	5	1	1	5
10	1	4	4	1	6	6

The Rank1- Rank6 may contain values from 1-6.

So within the same ID, what I want is:

1. Create a flag variable SAME, if there is same value from Rank1-Rank6 for each ID, then SAME=1 otherwise SAME = 0. for example , ID2 doesn't have the same value from Rank1-Rank6, then SAME =0.

2.

I want to know how many Ranks are the same as Rank1, Rank2... etc. For example, across total ID, there is 1 Rank2 is the same as Rank1( ID=8), 1 Rank3 is the same as Rank1(ID=5), 3 Rank4 are the same as Rank1 (ID =1,9,10). So the result data will be like this:

	Rank1	Rank2	Rank3	Rank4	Rank5	Rank6
Rank1	N/A	1	1	3	1	1
Rank2		N/A	X	X	X	X
Rank3			N/A	X	X	X
Rank4				N/A	X	X
Rank5					N/A	X
Rank6						N/A

Thanks for any help!!!

FreelanceReinh · Posted 11-25-2018 05:16 PM

Hi @Linmuxi,

Creating the flag variable SAME is easy. Just to clarify: SAME=1 if there's at least one pair (i,j) with RANKi=RANKj?

@Linmuxi wrote:

2. try to find matched variables. I want to know which variable has the same values as Rank1, Rank2, Rank3,...Rank6. For example, in ID=1, Rank1 and Rank4 has the same value, Rank2 and Rank3 have the same value. etc. But I haven't think out how to create variables to get the info I want.

You can safely leave the question of how to create the variables to the experts in this forum, but you should decide first what kind of variables you need for your next steps: perhaps 15 flags (like variable SAME) for rank1=rank2, rank1=rank3, ..., rank5=rank6?

Linmuxi · Posted 11-25-2018 05:50 PM

Hi @FreelanceReinh,

Thanks so much for your reply! Yes at least one pair then SAME =1.

For my second question, in general, I want to know how many Ranks are the same as Rank1, Rank2... etc. For example, across total ID, there is 1 Rank2 is the same as Rank1( ID=8), 1 Rank3 is the same as Rank1(ID=5), 3 Rank4 are the same as Rank1 (ID =1,9,10). So the result data will be like this:

	Rank1	Rank2	Rank3	Rank4	Rank5	Rank6
Rank1	N/A	1	1	3	1	1
Rank2		N/A	X	X	X	X
Rank3			N/A	X	X	X
Rank4				N/A	X	X
Rank5					N/A	X
Rank6						N/A

I am not sure if it's workable...kind of too complex to me to solve...

FreelanceReinh · Posted 11-25-2018 06:59 PM

This should create the triangular matrix (omitting the less interesting first column and last row):

data want2(drop=i j id rank:);
do until(last);
  set have end=last;
  array r[6] rank:;
  array e[5, 2:6];
  do i=1 to dim(r)-1;
    do j=i+1 to dim(r);
      e[i,j]+(r[i]=r[j]);
    end;
  end;
end;
file print;
do j=2 to dim(r);
  put @(j*6-4) 'Rank' j @;
end;
put;
do i=1 to dim(r)-1;
  put 'Rank' i @;
  do j=2 to dim(r);
    put +1 e[i,j] 5. @;
  end;
  put;
end;
run;

Result (using listing output):

       Rank2 Rank3 Rank4 Rank5 Rank6
Rank1      1     1     3     1     1
Rank2      .     4     2     3     2
Rank3      .     .     1     2     2
Rank4      .     .     .     2     0
Rank5      .     .     .     .     3

Dataset WANT2 contains the 25 elements of the 5x5 matrix in variables e1, e2, ..., e25 (row-major order).

(I'll be back later "today" -- it's after midnight in my time zone.)

Edit: Inserted file print; so as to write the result table to the output window rather than to the log.

FreelanceReinh · Posted 11-26-2018 06:36 AM

Here is a variant of my previous solution. It creates two output datasets:

WANT1 is a copy of input dataset HAVE plus variable SAME (now computed without using statistics function tricks).
WANT2 contains the desired triangular matrix, now in the form of a dataset whose variables correspond to the six columns of the matrix. This makes it a bit easier to retrieve a specific entry of the matrix. For example, the number of observations in dataset HAVE with (non-missing) Rank5=Rank2 can be found in variable Rank5 where var='Rank2'.

Variables Rank1 - Rank6 in dataset HAVE are now allowed to contain missing values, but only equalities of non-missing Ranki values are considered in the calculation of variable SAME in dataset WANT1 and in the counts in dataset WANT2.

data want1(drop=var) want2(drop=id rank1 same);
length var $5;
do until(last);
  set have end=last;
  array r[6] rank:;
  array e[5, 2:6] _temporary_;
  same=0;
  do i=1 to dim(r)-1;
    do j=i+1 to dim(r);
      _s=(r[i]=r[j] & r[j]>.z);
      if _s then same=1;
      e[i,j]+_s;
    end;
  end;
  output want1;
end;
do i=1 to dim(r)-1;
  var=cats('Rank',i);
  do j=2 to dim(r);
    r[j]=e[i,j];
  end;
  output want2;
end;
drop i j _s;
label var='A0'x;
run;

proc print data=want1;
id id;
run;

proc print data=want2 label;
id var;
run;

PROC PRINT output:

WANT1

ID    Rank1    Rank2    Rank3    Rank4    Rank5    Rank6    same

 1      3        4        4        3        1        5        1
 2      5        4        3        6        1        2        0
 3      1        6        6        3        6        6        1
 4      5        6        4        1        2        6        1
 5      3        1        3        2        1        6        1
 6      2        3        6        3        4        1        1
 7      6        2        2        2        2        6        1
 8      2        2        1        4        5        5        1
 9      1        2        5        1        1        5        1
10      1        4        4        1        6        6        1

WANT2

         Rank2    Rank3    Rank4    Rank5    Rank6

Rank1      1        1        3        1        1
Rank2      .        4        2        3        2
Rank3      .        .        1        2        2
Rank4      .        .        .        2        0
Rank5      .        .        .        .        3

mkeintz · Posted 11-25-2018 05:37 PM

You want to determine if there are any duplicate values in the variables RANK1 .... RANK6.

If you know that all the rank variables are integers from 1 through 6, then all you need to do is see if the mean and std of the RANK variables=mean(1,2,3,4,5,6) and std(1,2,3,4,5,6). There's no need to do a set of pairwise comparisons.

data have;
  input ID Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 ;
datalines;
1 3 4 4 3 1 5 
2 5 4 3 6 1 2 
3 1 6 6 3 6 6 
4 5 6 4 1 2 6 
5 3 1 3 2 1 6 
6 2 3 6 3 4 1 
7 6 2 2 2 2 6 
8 2 2 1 4 5 5 
9 1 2 5 1 1 5 
10 1 4 4 1 6 6 
run;

data want (drop=_:);
  if _n_=1 then do;
    _mn=mean(1,2,3,4,5,6);
    _sd=std(1,2,3,4,5,6);
  end;
  retain _mn _sd;
  set have;
  if mean(of rank:)=_mn and std(of rank:)=_sd then same=0;
  else same=1;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

FreelanceReinh · Posted 11-25-2018 05:45 PM

Or, a bit shorter (but essentially the same idea):

data want;
set have;
same=~(round(cv(of rank:),.01)=53.45);
run;

Ksharp · Posted 11-26-2018 08:21 AM

If you have SAS/IML ,that would be easy.

data have;
infile cards expandtabs;
  input ID Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 ;
datalines;
1	3	4	4	3	1	5
2	5	4	3	6	1	2
3	1	6	6	3	6	6
4	5	6	4	1	2	6
5	3	1	3	2	1	6
6	2	3	6	3	4	1
7	6	2	2	2	2	6
8	2	2	1	4	5	5
9	1	2	5	1	1	5
10	1	4	4	1	6	6
;
run;

proc iml;
use have(keep=rank:) nobs nobs;
read all var _all_ into rank[c=vname];
close;
r=j(ncol(rank),ncol(rank),.);
do i=1 to ncol(rank)-1;
 do j=i+1 to ncol(rank);
  r[i,j]=(rank[,i]=rank[,j])[+];
end;
end;

create want from r[r=vname c=vname];
append from r[r=vname];
close;
quit;

Ksharp · Posted 11-26-2018 08:21 AM

If you have SAS/IML ,that would be easy.

data have;
infile cards expandtabs;
  input ID Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 ;
datalines;
1	3	4	4	3	1	5
2	5	4	3	6	1	2
3	1	6	6	3	6	6
4	5	6	4	1	2	6
5	3	1	3	2	1	6
6	2	3	6	3	4	1
7	6	2	2	2	2	6
8	2	2	1	4	5	5
9	1	2	5	1	1	5
10	1	4	4	1	6	6
;
run;

proc iml;
use have(keep=rank:) nobs nobs;
read all var _all_ into rank[c=vname];
close;
r=j(ncol(rank),ncol(rank),.);
do i=1 to ncol(rank)-1;
 do j=i+1 to ncol(rank);
  r[i,j]=(rank[,i]=rank[,j])[+];
end;
end;

create want from r[r=vname c=vname];
append from r[r=vname];
close;
quit;

Registration is open

SAS Training: Just a Click Away