BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Carmel
Calcite | Level 5

Hi!

I'm working on SAS Studio and I have a big data of almost 700,000 observations, and it looks something like that:

_____________________________

ID    a1    a2    a3    a4    ...   a20

1      11    12    14    15     ...

2      11    13      .       .      ...

3      12    13    14      .      ...

4      12    14    15      .      ...

...

700000  ....

_____________________________

 

(The numbers in the a's can be everything, but always a1<a2<a3<a4 etc.)

 

as you can see, ID 1, ID 3 and ID 4 have the numbers 12 and 14 (not in the same variables),and ID 1 and 4 have 12 and 15 as well as 14 and 15.

 

So the data I want is:

___________

A      B    freq

12    14     3

12    15     2

14    15     2

___________

 

(I want to know how many pairs of numbers repeats in the data and their frequency)

How can I do that? Thank you very much!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20
data have;
input ID    a1    a2    a3    a4  ;
cards;
1      11    12    14    15     ...
2      11    13      .       .      ...
3      12    13    14      .      ...
4      12    14    15      .      ...
;

data temp;

set have;

array t(*) a1-a4;

do i=1 to dim(t)-1;

do j=i+1 to dim(t);

v1=t(i);

v2=t(j);

if n(v1,v2)=2 then output;

end;

end;

keep v1 v2;

run;



proc sql;

create table want as

select v1,v2,count(*) as freq

from temp

group by v1,v2

having freq>1;

quit;

View solution in original post

8 REPLIES 8
PaigeMiller
Diamond | Level 26

So the fact that 12 and 13 occurs in rows 3 and 4 is irrelevant? Could you explain further the logic as to why 12 and 13 don't appear in your output?

 

Is it only pairs of numbers you are interested in and not triples or quadruples or higher?

--
Paige Miller
Carmel
Calcite | Level 5
Sorry 🙂
I edited the post
novinosrin
Tourmaline | Level 20
data have;
input ID    a1    a2    a3    a4  ;
cards;
1      11    12    14    15     ...
2      11    13      .       .      ...
3      12    13    14      .      ...
4      12    14    15      .      ...
;

data temp;

set have;

array t(*) a1-a4;

do i=1 to dim(t)-1;

do j=i+1 to dim(t);

v1=t(i);

v2=t(j);

if n(v1,v2)=2 then output;

end;

end;

keep v1 v2;

run;



proc sql;

create table want as

select v1,v2,count(*) as freq

from temp

group by v1,v2

having freq>1;

quit;
PaigeMiller
Diamond | Level 26

@Carmel wrote:
Sorry 🙂
I edited the post

You may have edited the post, but I don't see where you have answered the question about 12 and 13.

--
Paige Miller
Carmel
Calcite | Level 5
13 is no longer exists in row 4.. check the data and output again..
ChrisNZ
Tourmaline | Level 20

Like this?

data HAVE;
input ID    a1    a2    a3    a4;
cards;
1      11    12    14    15     
2      11    13      .       .  
3      12    13    14      .    
4      12    14    15      .    
run;

data PAIRS; 
  keep A B FRQ;
  * 1- Define the hash table. 
  *    FRQ is incremented when keys are accessed;
  if _N_=1 then do;
    dcl hash H(suminc:'FRQ',ordered:'a');
    H.definekey('A','B');
    H.definedone();
    dcl hiter HI('H');
    call missing(A,B); 
    FRQ=1;
  end;
  * 2- Fill the hash table. ;
  *    Methods check() or add() increment FRQ;
  array VALUES(*) A1-A4 _temporary_;
  do while(^LASTOBS);
    set HAVE end=LASTOBS;
    do I=1 to dim(VALUES)-1;
      do J=I+1 to dim(VALUES);
        A=VALUES[I];
        B=VALUES[J];
        RC=H.check() ;
        if A & B & A ne B & RC then RC=H.add();  
      end;
    end;
  end;      
  * 3- Read the hash table and fetch FRQ for each row;
  *    Output if frequency > 1;
  RC=HI.first();  
  do while(^RC);
    RC=H.sum(sum:FRQ);
    if FRQ > 1 then output;
    RC=HI.next();     
  end;     
  stop;
run;  
             

A B FRQ
12 14 3
12 15 2
14 15 2

 

Reeza
Super User
You want all two way frequencies? Is there a fixed range of values that can be present in the variables? If so you want a distance type matrix, that shows all two way combinations. One way is to make all combinations and count them, shown by others. Another is to create a dummy variable framework and use a PROC CORR.
novinosrin
Tourmaline | Level 20

another hash:

 

data _null_;
if _N_=1 then do;
    declare hash H(ordered:'a');
    H.definekey('v1','v2');
	h.definedata('v1','v2','freq');
    H.definedone();
    call missing(v1,v2); 
 end;
 set have end=lr;
array t(*) a1-a4;
do i=1 to dim(t)-1;
do j=i+1 to dim(t);
v1=t(i);
v2=t(j);
if n(v1,v2)=2 then do;
rc=h.find();
if rc=0 then do;freq=sum(freq,1);h.replace();end;
else do;freq=1;h.replace();end;
end;
end;
end;
if lr then h.output(dataset:'want(where=(freq>1))');
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1347 views
  • 1 like
  • 5 in conversation