Hi All, I have a dataset with variables Rank1-Rank10. and each Rank variables may varies from 1-5, and there is lots of missing values here.
id | Rank1 | Rank2 | Rank3 | Rank4 | Rank5 | Rank6 | Rank7 | Rank8 | Rank9 | Rank10 |
1 | 3 | . | . | . | . | . | 4 | . | . | . |
2 | 3 | . | 3 | . | . | . | . | . | . | . |
3 | 1 | . | . | . | . | 1 | 4 | . | . | . |
4 | 3 | . | . | . | . | . | 2 | . | . | . |
5 | 3 | . | 3 | . | . | . | . | . | . | 2 |
6 | . | . | . | . | . | . | 4 | . | 1 | . |
what I want is to figure out if there is any same values from Rank1-Rank10 within same row. For example, there is same values for ID 2 ,3 and 5. For those IDs have same values, I want to create a variable called SAME, let same =1. otherwise same= 0
can anyone help with this?? Thanks so much!!
Try next code:
data want (drop=count1-count5);
set have;
array n {*} count1-count5;
array r {*} rank1-rank10;
same = 0;
do i=1 to dim(r);
if r(i) ne . then n(r(i)) +1;
if n(r(i)) > 1 then do;
same = 1;
leave;
end;
end;
run;
data want;
set have;
array rank rank1-rank10;
array match match1-match10;
do i=1 to dim(rank)-1;
i_=i+1;
do j=i_ to dim(rank)-1;
if rank[i] ne . and rank[j] ne . then do;
if rank[i] eq rank[j] then match[j]=1;
end;
end;
end;
if sum(of match1-match10) ge 1 then same =1;
else same=0;
drop i j i_ match1-match10;
run;
I would do it like this:
data want;
set have;
array ranks(*) rank1-rank10;
same=0;
do _N_=dim(ranks) to 2 by -1 until(same);
if not missing(ranks(_N_)) then
same=whichn(ranks(_N_),of ranks(*))<_N_;
end;
run;
data have;
infile cards expandtabs;
input id Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 Rank7 Rank8 Rank9 Rank10 ;
cards;
1 3 . . . . . 4 . . .
2 3 . 3 . . . . . . .
3 1 . . . . 1 4 . . .
4 3 . . . . . 2 . . .
5 3 . 3 . . . . . . 2
6 . . . . . . 4 . 1 .
;
run;
data want;
if _n_=1 then do;
declare hash h();
h.definekey('k');
h.definedone();
end;
set have;
array x{*} rank:;
do i=1 to dim(x);
if not missing(x{i}) then do;k=x{i};h.ref();end;
end;
same=n(of x{*}) ne h.num_items;
h.clear();
drop i k;
run;
data have;
infile cards expandtabs;
input id Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 Rank7 Rank8 Rank9 Rank10 ;
cards;
1 3 . . . . . 4 . . .
2 3 . 3 . . . . . . .
3 1 . . . . 1 4 . . .
4 3 . . . . . 2 . . .
5 3 . 3 . . . . . . 2
6 . . . . . . 4 . 1 .
;
data want ;
set have;
array r [*] rank:;
array t [ 10] _temporary_ ;
call pokelong ( (peekclong (addrlong(r[1]), 80)), addrlong(t[1]), 80) ;
call sortn(of t(*));
do _n_=whichn(coalesce(of t(*)),of t(*))+1 to dim(t);
same=t(_n_)=t(_n_-1);
if same then leave;
end;
run;
I like the idea of sorting values, which reduces the task to looking for identical neighboring values. I presume you put the variables in a temporary array, to avoid sorting the original variables. But in this case, it's a bit simpler (no poke and peek) to sort the rank variables in place, process them, and then re-read in original order:
data have;
input id Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 Rank7 Rank8 Rank9 Rank10 ;
datalines;
1 3 . . . . . 4 . . .
2 3 . 3 . . . . . . .
3 1 . . . . 1 4 . . .
4 3 . . . . . 2 . . .
5 3 . 3 . . . . . . 2
6 . . . . . . 4 . 1 .
run;
data want (drop=_i);
set have;
same=0;
array rnk {*} rank: ;
call sortn(of rnk{*}); /* Sort the rank variables */
/* Now look for identical neighbors */
do _i=dim(rnk) to 2 by -1 while (same=0 and rnk{_i-1}^=.);
if rnk{_i}=rnk{_i-1} then same=1;
end;
/* Reread the rank variables, in pre-sorted sequence */
set have (keep=rank:) point=_n_;
run;
The do loop starts at the upper bound of the rnk array because all the missing values will be sorted towards the lower bound.
Thank you Mark. Actually I wanted to reach out to you for a small understanding help on the conditional set statement on the other thread and more important one on a conditional lag. I will probably do that on Friday assuming it's only fair not to bother you weekdays. Just wanted to let you know I have a request coming through.
Not the 1st or the last that you have helped me speed over the past couple of years , so another one. Have a nice afternoon/evening
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.