Solved: Match Categories for each ID to other ID's

LoveSAS · Posted 03-03-2012 05:28 PM

My data structure is like this

ID TYPE

A K1

A K2

A K3

B K3

C K2

C K3

D K2

I want to match, for each unique ID, how many other unique ID's that have the same type. For example, the output for this data would be

ID NUMBEROFID

A 3 (A contains K1,K2,K3. Nobody else is K1 so we don't need to consider K1.

Because there are C and D,except A, that are K2. There are B and C,except A,

that are K3 => the unique members, hence, are=> B C D, return 3 )

B 2 (Because B has only one type K3, so the only member that has K3 except B are => A and C. So return 2 )

C 2 (C contains K2 and K3. The member that has K2, except C, are A,D. The member that has K3,

except C, are A,B => Member are A,B,D, return 3)

D 2 (Same reasoning => A,C, return 2)

Please help! Thank you in Advance

Tom · Posted 03-03-2012 06:08 PM

You can do this with SQL by joining the table with itself.

data have ;
  input (id type) ($) @@;
cards;
A K1 A K2 A K3 B K3 C K2 C K3 D K2
;;;;
proc sql noprint ;
 create table want as
  select a.id,count(distinct b.id) as numberofid
  from have a left join have b
  on a.type=b.type and a.id ^= b.id
  group by a.id
 ;
quit;

id=A numberofid=3
id=B numberofid=2
id=C numberofid=3
id=D numberofid=2

View solution in original post

Tom · Posted 03-03-2012 06:08 PM

You can do this with SQL by joining the table with itself.

data have ;
  input (id type) ($) @@;
cards;
A K1 A K2 A K3 B K3 C K2 C K3 D K2
;;;;
proc sql noprint ;
 create table want as
  select a.id,count(distinct b.id) as numberofid
  from have a left join have b
  on a.type=b.type and a.id ^= b.id
  group by a.id
 ;
quit;

id=A numberofid=3
id=B numberofid=2
id=C numberofid=3
id=D numberofid=2

art297 · Posted 03-03-2012 06:42 PM

Tom,

That's not what the OP was asking for, but close. I think the OP was asking for a similar join, but based on the transposed table:

data have ;

input (id type) ($) @@;

cards;

A K1 A K2 A K3 B K3 C K2 C K3 D K2

;;;;

proc transpose data=have

out=need (drop=_:);

var type;

id type;

by id;

run;

I'd be interested to see a sql solution that, given the above data, the correct count is achieved.

Tom · Posted 03-03-2012 07:05 PM

Art - Not sure what you mean. I got the numbers that match the english description. For id=C the number the OP listed 2, but if you read the text it should be 3.

- Tom

art297 · Posted 03-03-2012 09:11 PM

Tom, after reviewing your code and result I agree with you.

PGStats · Posted 03-03-2012 07:33 PM

I had read the question as the calculation of a sort of group membership: how many poeple share how many characteristics with others in the group, as in the rephrasing:

data have ;
input (id type) ($) @@;
cards;
Paul Smokes
Paul OwnsACar
Paul PlaysGolf
Henry PlaysGolf
Susan OwnsACar
Susan PlaysGolf
George OwnsACar
;;;;

/* An interpretation which fits Tom's proposed solution */

proc sql noprint ;
create table want as
select a.id,count(distinct b.id) as fitsTheClub
from have a left join have b
on a.type=b.type and a.id ^= b.id
group by a.id
;
quit;

PG

LoveSAS · Posted 03-03-2012 11:46 PM

Thank you very much guys! Sorry for my typo for C case. It should be 3 as Tom suggested.

Would it be possible for you to explain the code?

proc sql noprint ;

create table want as select a.id,count(distinct b.id) as numberofid from have a left join have b on a.type=b.type and a.id ^= b.id group by a.id ; quit;

I would like to learn how you come up with these steps.

art297 · Posted 03-04-2012 01:47 AM

Tom's code is fairly easy to understand if you add in some of the extra variables that he didn't include in the file. e.g., if you run:

proc sql noprint ;

create table want as

select a.id,count(distinct b.id) as numberofid

,a.type,b.id as b_id

from have a left join have b

on a.type=b.type and a.id ^= b.id

group by a.id

;

quit;

you would get the following file:

id numberofid type b_id

A 3 K3 B

A 3 K2 C

A 3 K2 D

A 3 K3 C

A 3 K1

B 2 K3 C

B 2 K3 A

C 3 K2 A

C 3 K3 B

C 3 K2 D

C 3 K3 A

D 2 K2 A

D 2 K2 C

Tom used a left join so there MUST be at least one record for each ID/type combination in the original file. However, since he excluded matches where a.id = b.id, the one non-match record for A/type K1 doesn't get an assignment for b.id.

All of the other a.id/type combination had matching b.ids, thus they appear in the b.id column.

Then, since he is using the count function on distinct b.id/type values, the missing value and duplicates dont get counted.

Match Categories for each ID to other ID's

Re: Match Categories for each ID to other ID's

Re: Match Categories for each ID to other ID's

Re: Match Categories for each ID to other ID's

Re: Match Categories for each ID to other ID's

Re: Match Categories for each ID to other ID's

Re: Match Categories for each ID to other ID's

Match Categories for each ID to other ID's

Match Categories for each ID to other ID's

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away