SAS Programming

Linmuxi · Posted 11-25-2018 07:06 PM

Hi All, I have a dataset with variables Rank1-Rank10. and each Rank variables may varies from 1-5, and there is lots of missing values here.

id	Rank1	Rank2	Rank3	Rank4	Rank5	Rank6	Rank7	Rank8	Rank9	Rank10
1	3	.	.	.	.	.	4	.	.	.
2	3	.	3	.	.	.	.	.	.	.
3	1	.	.	.	.	1	4	.	.	.
4	3	.	.	.	.	.	2	.	.	.
5	3	.	3	.	.	.	.	.	.	2
6	.	.	.	.	.	.	4	.	1	.

what I want is to figure out if there is any same values from Rank1-Rank10 within same row. For example, there is same values for ID 2 ,3 and 5. For those IDs have same values, I want to create a variable called SAME, let same =1. otherwise same= 0

can anyone help with this?? Thanks so much!!

Shmuel · Posted 11-25-2018 07:32 PM

Try next code:

data want (drop=count1-count5);
  set have;
        array n {*}  count1-count5;
        array r {*}  rank1-rank10;

        same = 0;
        do i=1 to dim(r);
            if r(i) ne . then n(r(i)) +1;
            if n(r(i)) > 1 then do;
               same = 1; 
              
               leave;
           end;
       end;
run;

r_behata · Posted 11-25-2018 11:55 PM

data want;
set have;
array rank rank1-rank10;
array match match1-match10;

do i=1 to dim(rank)-1;
i_=i+1;
	do j=i_ to dim(rank)-1;
	if rank[i] ne . and rank[j] ne . then do;
		if rank[i] eq rank[j] then match[j]=1;		
	end;
	end;
end;

if sum(of match1-match10) ge 1 then same =1;
	else same=0;
	
	drop i j i_ match1-match10;
run;

s_lassen · Posted 11-26-2018 04:38 AM

I would do it like this:

data want;
  set have;
  array ranks(*) rank1-rank10;
  same=0;
  do _N_=dim(ranks) to 2 by -1 until(same);
    if not missing(ranks(_N_)) then
      same=whichn(ranks(_N_),of ranks(*))<_N_;
    end;
run;

Ksharp · Posted 11-26-2018 07:02 AM

data have;
infile cards expandtabs;
input id	Rank1	Rank2	Rank3	Rank4	Rank5	Rank6	Rank7	Rank8	Rank9	Rank10 ;
cards;
1	3	.	.	.	.	.	4	.	.	.
2	3	.	3	.	.	.	.	.	.	.
3	1	.	.	.	.	1	4	.	.	.
4	3	.	.	.	.	.	2	.	.	.
5	3	.	3	.	.	.	.	.	.	2
6	.	.	.	.	.	.	4	.	1	.
;
run;
data want;
 if _n_=1 then do;
  declare hash h();
  h.definekey('k');
  h.definedone();
 end;
set have;
array x{*} rank:;
do i=1 to dim(x);
  if not missing(x{i}) then do;k=x{i};h.ref();end;
end;
same=n(of x{*}) ne h.num_items;
h.clear();
drop i k;
run;

novinosrin · Posted 11-26-2018 01:36 PM

data have;
infile cards expandtabs;
input id	Rank1	Rank2	Rank3	Rank4	Rank5	Rank6	Rank7	Rank8	Rank9	Rank10 ;
cards;
1	3	.	.	.	.	.	4	.	.	.
2	3	.	3	.	.	.	.	.	.	.
3	1	.	.	.	.	1	4	.	.	.
4	3	.	.	.	.	.	2	.	.	.
5	3	.	3	.	.	.	.	.	.	2
6	.	.	.	.	.	.	4	.	1	.
;


data want ;
set have;
 array r [*] rank:;
 array t [ 10] _temporary_ ;
 call pokelong ( (peekclong (addrlong(r[1]), 80)), addrlong(t[1]), 80) ; 
 call sortn(of t(*)); 
 do _n_=whichn(coalesce(of t(*)),of t(*))+1 to dim(t);
 same=t(_n_)=t(_n_-1);
 if same then leave;
 end;
 run;

mkeintz · Posted 11-26-2018 04:56 PM

@novinosrin

I like the idea of sorting values, which reduces the task to looking for identical neighboring values. I presume you put the variables in a temporary array, to avoid sorting the original variables. But in this case, it's a bit simpler (no poke and peek) to sort the rank variables in place, process them, and then re-read in original order:

data have;
  input id Rank1 Rank2 Rank3 Rank4 Rank5 Rank6 Rank7 Rank8 Rank9 Rank10 ;
datalines;
1 3 . . . . . 4 . . . 
2 3 . 3 . . . . . . . 
3 1 . . . . 1 4 . . . 
4 3 . . . . . 2 . . . 
5 3 . 3 . . . . . . 2 
6 . . . . . . 4 . 1 . 
run;

data want (drop=_i);
  set have;
  same=0;
  array rnk {*} rank: ;
  call sortn(of rnk{*});   /* Sort the rank variables */

  /* Now look for identical neighbors */
  do _i=dim(rnk) to 2 by -1 while (same=0 and rnk{_i-1}^=.);
    if rnk{_i}=rnk{_i-1} then same=1;
  end;

  /* Reread the rank variables, in pre-sorted sequence */
  set have (keep=rank:) point=_n_;
run;

The do loop starts at the upper bound of the rnk array because all the missing values will be sorted towards the lower bound.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

novinosrin · Posted 11-26-2018 05:05 PM

Thank you Mark. Actually I wanted to reach out to you for a small understanding help on the conditional set statement on the other thread and more important one on a conditional lag. I will probably do that on Friday assuming it's only fair not to bother you weekdays. Just wanted to let you know I have a request coming through.

Not the 1st or the last that you have helped me speed over the past couple of years , so another one. Have a nice afternoon/evening

SAS Programming

Duplicate check across columns within same row

Re: Duplicate check across columns within same row

Re: Duplicate check across columns within same row

Re: Duplicate check across columns within same row

Re: Duplicate check across columns within same row

Re: Duplicate check across columns within same row

Re: Duplicate check across columns within same row

Re: Duplicate check across columns within same row

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...