Help using Base SAS procedures

Match Categories for each ID to other ID's

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

Match Categories for each ID to other ID's

My data structure is like this

ID         TYPE

A          K1

A                    K2

A                    K3

B                    K3

C                    K2

C                    K3

D                    K2

I want to match, for each unique ID, how many other unique ID's that have the same type. For example, the output for this data would be

ID                    NUMBEROFID

A            3                                     (A contains K1,K2,K3. Nobody else is K1 so we don't need to consider K1.

                                                      Because there are C and D,except A, that are K2. There are B and C,except A,

                                                      that are K3 => the unique members, hence, are=> B C D, return 3 )

B            2                                     (Because B has only one type K3, so the only member that has K3 except B are => A and C. So return 2  )

C            2                                    (C contains K2 and K3. The member that has K2, except C, are A,D. The member that has K3,

                                                      except C, are A,B => Member are A,B,D, return 3)

D            2                                    (Same reasoning => A,C, return 2)

Please help! Thank you in Advance Smiley Happy


Accepted Solutions
Solution
‎03-03-2012 06:08 PM
Super User
Super User
Posts: 6,500

Re: Match Categories for each ID to other ID's

You can do this with SQL by joining the table with itself.

data have ;

  input (id type) ($) @@;

cards;

A K1 A K2 A K3 B K3 C K2 C K3 D K2

;;;;

proc sql noprint ;

create table want as

  select a.id,count(distinct b.id) as numberofid

  from have a left join have b

  on a.type=b.type and a.id ^= b.id

  group by a.id

;

quit;

id=A numberofid=3

id=B numberofid=2

id=C numberofid=3

id=D numberofid=2

View solution in original post


All Replies
Solution
‎03-03-2012 06:08 PM
Super User
Super User
Posts: 6,500

Re: Match Categories for each ID to other ID's

You can do this with SQL by joining the table with itself.

data have ;

  input (id type) ($) @@;

cards;

A K1 A K2 A K3 B K3 C K2 C K3 D K2

;;;;

proc sql noprint ;

create table want as

  select a.id,count(distinct b.id) as numberofid

  from have a left join have b

  on a.type=b.type and a.id ^= b.id

  group by a.id

;

quit;

id=A numberofid=3

id=B numberofid=2

id=C numberofid=3

id=D numberofid=2

PROC Star
Posts: 7,363

Re: Match Categories for each ID to other ID's

Tom,

That's not what the OP was asking for, but close.  I think the OP was asking for a similar join, but based on the transposed table:

data have ;

  input (id type) ($) @@;

cards;

A K1 A K2 A K3 B K3 C K2 C K3 D K2

;;;;

proc transpose data=have

  out=need (drop=_Smiley Happy;

  var type;

  id type;

  by id;

run;

I'd be interested to see a sql solution that, given the above data, the correct count is achieved.

Super User
Super User
Posts: 6,500

Re: Match Categories for each ID to other ID's

Art - Not sure what you mean.  I got the numbers that match the english description. For id=C the number the OP listed 2, but if you read the text it should be 3.

- Tom

PROC Star
Posts: 7,363

Re: Match Categories for each ID to other ID's

Tom, after reviewing your code and result I agree with you.

Respected Advisor
Posts: 4,651

Re: Match Categories for each ID to other ID's

I had read the question as the calculation of a sort of group membership: how many poeple share how many characteristics with others in the group, as in the rephrasing:

data have ;
  input (id type) ($) @@;
cards;
Paul Smokes
Paul OwnsACar
Paul PlaysGolf
Henry PlaysGolf
Susan OwnsACar
Susan PlaysGolf
George OwnsACar
;;;;

/* An interpretation which fits Tom's proposed solution */

proc sql noprint ;
create table want as
  select a.id,count(distinct b.id) as fitsTheClub
  from have a left join have b
  on a.type=b.type and a.id ^= b.id
  group by a.id
;
quit;

PG

PG
Occasional Contributor
Posts: 12

Match Categories for each ID to other ID's

Thank you very much guys! Sorry for my typo for C case. It should be 3 as Tom suggested. 

Would it be possible for you to explain the code?

proc sql noprint ;

create table want as
  select a.id,count(distinct b.id) as numberofid
  from have a left join have b
  on a.type=b.type and a.id ^= b.id
  group by a.id
;
quit;

I would like to learn how you come up with these steps.

PROC Star
Posts: 7,363

Match Categories for each ID to other ID's

Tom's code is fairly easy to understand if you add in some of the extra variables that he didn't include in the file.  e.g., if you run:

proc sql noprint ;

create table want as

  select a.id,count(distinct b.id) as numberofid

   ,a.type,b.id as b_id

  from have a left join have b

  on a.type=b.type and a.id ^= b.id

  group by a.id

;

quit;

you would get the following file:

id    numberofid    type    b_id

A          3         K3      B

A          3         K2      C

A          3         K2      D

A          3         K3      C

A          3         K1

B          2         K3      C

B          2         K3      A

C          3         K2      A

C          3         K3      B

C          3         K2      D

C          3         K3      A

D          2         K2      A

D          2         K2      C

Tom used a left join so there MUST be at least one record for each ID/type combination in the original file.  However, since he excluded matches where a.id = b.id, the one non-match record for A/type K1 doesn't get an assignment for b.id.

All of the other a.id/type combination had matching b.ids, thus they appear in the b.id column.

Then, since he is using the count function on distinct b.id/type values, the missing value and duplicates dont get counted.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 213 views
  • 7 likes
  • 4 in conversation