Hi!
I'm working on SAS Studio and I have a big data of almost 700,000 observations, and it looks something like that:
_____________________________
ID a1 a2 a3 a4 ... a20
1 11 12 14 15 ...
2 11 13 . . ...
3 12 13 14 . ...
4 12 14 15 . ...
...
700000 ....
_____________________________
(The numbers in the a's can be everything, but always a1<a2<a3<a4 etc.)
as you can see, ID 1, ID 3 and ID 4 have the numbers 12 and 14 (not in the same variables),and ID 1 and 4 have 12 and 15 as well as 14 and 15.
So the data I want is:
___________
A B freq
12 14 3
12 15 2
14 15 2
___________
(I want to know how many pairs of numbers repeats in the data and their frequency)
How can I do that? Thank you very much!
data have;
input ID a1 a2 a3 a4 ;
cards;
1 11 12 14 15 ...
2 11 13 . . ...
3 12 13 14 . ...
4 12 14 15 . ...
;
data temp;
set have;
array t(*) a1-a4;
do i=1 to dim(t)-1;
do j=i+1 to dim(t);
v1=t(i);
v2=t(j);
if n(v1,v2)=2 then output;
end;
end;
keep v1 v2;
run;
proc sql;
create table want as
select v1,v2,count(*) as freq
from temp
group by v1,v2
having freq>1;
quit;
So the fact that 12 and 13 occurs in rows 3 and 4 is irrelevant? Could you explain further the logic as to why 12 and 13 don't appear in your output?
Is it only pairs of numbers you are interested in and not triples or quadruples or higher?
data have;
input ID a1 a2 a3 a4 ;
cards;
1 11 12 14 15 ...
2 11 13 . . ...
3 12 13 14 . ...
4 12 14 15 . ...
;
data temp;
set have;
array t(*) a1-a4;
do i=1 to dim(t)-1;
do j=i+1 to dim(t);
v1=t(i);
v2=t(j);
if n(v1,v2)=2 then output;
end;
end;
keep v1 v2;
run;
proc sql;
create table want as
select v1,v2,count(*) as freq
from temp
group by v1,v2
having freq>1;
quit;
@Carmel wrote:
Sorry 🙂
I edited the post
You may have edited the post, but I don't see where you have answered the question about 12 and 13.
Like this?
data HAVE;
input ID a1 a2 a3 a4;
cards;
1 11 12 14 15
2 11 13 . .
3 12 13 14 .
4 12 14 15 .
run;
data PAIRS;
keep A B FRQ;
* 1- Define the hash table.
* FRQ is incremented when keys are accessed;
if _N_=1 then do;
dcl hash H(suminc:'FRQ',ordered:'a');
H.definekey('A','B');
H.definedone();
dcl hiter HI('H');
call missing(A,B);
FRQ=1;
end;
* 2- Fill the hash table. ;
* Methods check() or add() increment FRQ;
array VALUES(*) A1-A4 _temporary_;
do while(^LASTOBS);
set HAVE end=LASTOBS;
do I=1 to dim(VALUES)-1;
do J=I+1 to dim(VALUES);
A=VALUES[I];
B=VALUES[J];
RC=H.check() ;
if A & B & A ne B & RC then RC=H.add();
end;
end;
end;
* 3- Read the hash table and fetch FRQ for each row;
* Output if frequency > 1;
RC=HI.first();
do while(^RC);
RC=H.sum(sum:FRQ);
if FRQ > 1 then output;
RC=HI.next();
end;
stop;
run;
A | B | FRQ |
12 | 14 | 3 |
12 | 15 | 2 |
14 | 15 | 2 |
another hash:
data _null_;
if _N_=1 then do;
declare hash H(ordered:'a');
H.definekey('v1','v2');
h.definedata('v1','v2','freq');
H.definedone();
call missing(v1,v2);
end;
set have end=lr;
array t(*) a1-a4;
do i=1 to dim(t)-1;
do j=i+1 to dim(t);
v1=t(i);
v2=t(j);
if n(v1,v2)=2 then do;
rc=h.find();
if rc=0 then do;freq=sum(freq,1);h.replace();end;
else do;freq=1;h.replace();end;
end;
end;
end;
if lr then h.output(dataset:'want(where=(freq>1))');
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.