- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All, I have a dataset with variables Rank1-Rank10. and each Rank variables may varies from 1-5, and there is lots of missing values here.
id | Rank1 | Rank2 | Rank3 | Rank4 | Rank5 | Rank6 | Rank7 | Rank8 | Rank9 | Rank10 |
1 | 3 | . | . | . | . | . | 4 | . | . | . |
2 | 3 | . | 3 | . | . | . | . | . | . | . |
3 | 1 | . | . | . | . | 1 | 4 | . | . | . |
4 | 3 | . | . | . | . | . | 2 | . | . | . |
5 | 3 | . | 3 | . | . | . | . | . | . | 2 |
6 | . | . | . | . | . | . | 4 | . | 1 | . |
what I want is to figure out if there is any same values from Rank1-Rank10 within same row. For example, there is same values for ID 2 ,3 and 5. For those IDs have same values, I want to create a variable called SAME, let same =1. otherwise same= 0
can anyone help with this?? Thanks so much!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try next code:
data want (drop=count1-count5);
set have;
array n {*} count1-count5;
array r {*} rank1-rank10;
same = 0;
do i=1 to dim(r);
if r(i) ne . then n(r(i)) +1;
if n(r(i)) > 1 then do;
same = 1;
leave;
end;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data want;
set have;
array rank rank1-rank10;
array match match1-match10;
do i=1 to dim(rank)-1;
i_=i+1;
do j=i_ to dim(rank)-1;
if rank[i] ne . and rank[j] ne . then do;
if rank[i] eq rank[j] then match[j]=1;
end;
end;
end;
if sum(of match1-match10) ge 1 then same =1;
else same=0;
drop i j i_ match1-match10;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would do it like this:
data want;
set have;
array ranks(*) rank1-rank10;
same=0;
do _N_=dim(ranks) to 2 by -1 until(same);
if not missing(ranks(_N_)) then
same=whichn(ranks(_N_),of ranks(*))<_N_;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
infile cards expandtabs;
input id Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 Rank7 Rank8 Rank9 Rank10 ;
cards;
1 3 . . . . . 4 . . .
2 3 . 3 . . . . . . .
3 1 . . . . 1 4 . . .
4 3 . . . . . 2 . . .
5 3 . 3 . . . . . . 2
6 . . . . . . 4 . 1 .
;
run;
data want;
if _n_=1 then do;
declare hash h();
h.definekey('k');
h.definedone();
end;
set have;
array x{*} rank:;
do i=1 to dim(x);
if not missing(x{i}) then do;k=x{i};h.ref();end;
end;
same=n(of x{*}) ne h.num_items;
h.clear();
drop i k;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
infile cards expandtabs;
input id Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 Rank7 Rank8 Rank9 Rank10 ;
cards;
1 3 . . . . . 4 . . .
2 3 . 3 . . . . . . .
3 1 . . . . 1 4 . . .
4 3 . . . . . 2 . . .
5 3 . 3 . . . . . . 2
6 . . . . . . 4 . 1 .
;
data want ;
set have;
array r [*] rank:;
array t [ 10] _temporary_ ;
call pokelong ( (peekclong (addrlong(r[1]), 80)), addrlong(t[1]), 80) ;
call sortn(of t(*));
do _n_=whichn(coalesce(of t(*)),of t(*))+1 to dim(t);
same=t(_n_)=t(_n_-1);
if same then leave;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I like the idea of sorting values, which reduces the task to looking for identical neighboring values. I presume you put the variables in a temporary array, to avoid sorting the original variables. But in this case, it's a bit simpler (no poke and peek) to sort the rank variables in place, process them, and then re-read in original order:
data have;
input id Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 Rank7 Rank8 Rank9 Rank10 ;
datalines;
1 3 . . . . . 4 . . .
2 3 . 3 . . . . . . .
3 1 . . . . 1 4 . . .
4 3 . . . . . 2 . . .
5 3 . 3 . . . . . . 2
6 . . . . . . 4 . 1 .
run;
data want (drop=_i);
set have;
same=0;
array rnk {*} rank: ;
call sortn(of rnk{*}); /* Sort the rank variables */
/* Now look for identical neighbors */
do _i=dim(rnk) to 2 by -1 while (same=0 and rnk{_i-1}^=.);
if rnk{_i}=rnk{_i-1} then same=1;
end;
/* Reread the rank variables, in pre-sorted sequence */
set have (keep=rank:) point=_n_;
run;
The do loop starts at the upper bound of the rnk array because all the missing values will be sorted towards the lower bound.
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Mark. Actually I wanted to reach out to you for a small understanding help on the conditional set statement on the other thread and more important one on a conditional lag. I will probably do that on Friday assuming it's only fair not to bother you weekdays. Just wanted to let you know I have a request coming through.
Not the 1st or the last that you have helped me speed over the past couple of years , so another one. Have a nice afternoon/evening