BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LoveSAS
Calcite | Level 5

My data structure is like this

ID         TYPE

A          K1

A                    K2

A                    K3

B                    K3

C                    K2

C                    K3

D                    K2

I want to match, for each unique ID, how many other unique ID's that have the same type. For example, the output for this data would be

ID                    NUMBEROFID

A            3                                     (A contains K1,K2,K3. Nobody else is K1 so we don't need to consider K1.

                                                      Because there are C and D,except A, that are K2. There are B and C,except A,

                                                      that are K3 => the unique members, hence, are=> B C D, return 3 )

B            2                                     (Because B has only one type K3, so the only member that has K3 except B are => A and C. So return 2  )

C            2                                    (C contains K2 and K3. The member that has K2, except C, are A,D. The member that has K3,

                                                      except C, are A,B => Member are A,B,D, return 3)

D            2                                    (Same reasoning => A,C, return 2)

Please help! Thank you in Advance Smiley Happy

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You can do this with SQL by joining the table with itself.

data have ;

  input (id type) ($) @@;

cards;

A K1 A K2 A K3 B K3 C K2 C K3 D K2

;;;;

proc sql noprint ;

create table want as

  select a.id,count(distinct b.id) as numberofid

  from have a left join have b

  on a.type=b.type and a.id ^= b.id

  group by a.id

;

quit;

id=A numberofid=3

id=B numberofid=2

id=C numberofid=3

id=D numberofid=2

View solution in original post

7 REPLIES 7
Tom
Super User Tom
Super User

You can do this with SQL by joining the table with itself.

data have ;

  input (id type) ($) @@;

cards;

A K1 A K2 A K3 B K3 C K2 C K3 D K2

;;;;

proc sql noprint ;

create table want as

  select a.id,count(distinct b.id) as numberofid

  from have a left join have b

  on a.type=b.type and a.id ^= b.id

  group by a.id

;

quit;

id=A numberofid=3

id=B numberofid=2

id=C numberofid=3

id=D numberofid=2

art297
Opal | Level 21

Tom,

That's not what the OP was asking for, but close.  I think the OP was asking for a similar join, but based on the transposed table:

data have ;

  input (id type) ($) @@;

cards;

A K1 A K2 A K3 B K3 C K2 C K3 D K2

;;;;

proc transpose data=have

  out=need (drop=_:);

  var type;

  id type;

  by id;

run;

I'd be interested to see a sql solution that, given the above data, the correct count is achieved.

Tom
Super User Tom
Super User

Art - Not sure what you mean.  I got the numbers that match the english description. For id=C the number the OP listed 2, but if you read the text it should be 3.

- Tom

art297
Opal | Level 21

Tom, after reviewing your code and result I agree with you.

PGStats
Opal | Level 21

I had read the question as the calculation of a sort of group membership: how many poeple share how many characteristics with others in the group, as in the rephrasing:

data have ;
  input (id type) ($) @@;
cards;
Paul Smokes
Paul OwnsACar
Paul PlaysGolf
Henry PlaysGolf
Susan OwnsACar
Susan PlaysGolf
George OwnsACar
;;;;

/* An interpretation which fits Tom's proposed solution */

proc sql noprint ;
create table want as
  select a.id,count(distinct b.id) as fitsTheClub
  from have a left join have b
  on a.type=b.type and a.id ^= b.id
  group by a.id
;
quit;

PG

PG
LoveSAS
Calcite | Level 5

Thank you very much guys! Sorry for my typo for C case. It should be 3 as Tom suggested. 

Would it be possible for you to explain the code?

proc sql noprint ;

create table want as
  select a.id,count(distinct b.id) as numberofid
  from have a left join have b
  on a.type=b.type and a.id ^= b.id
  group by a.id
;
quit;

I would like to learn how you come up with these steps.

art297
Opal | Level 21

Tom's code is fairly easy to understand if you add in some of the extra variables that he didn't include in the file.  e.g., if you run:

proc sql noprint ;

create table want as

  select a.id,count(distinct b.id) as numberofid

   ,a.type,b.id as b_id

  from have a left join have b

  on a.type=b.type and a.id ^= b.id

  group by a.id

;

quit;

you would get the following file:

id    numberofid    type    b_id

A          3         K3      B

A          3         K2      C

A          3         K2      D

A          3         K3      C

A          3         K1

B          2         K3      C

B          2         K3      A

C          3         K2      A

C          3         K3      B

C          3         K2      D

C          3         K3      A

D          2         K2      A

D          2         K2      C

Tom used a left join so there MUST be at least one record for each ID/type combination in the original file.  However, since he excluded matches where a.id = b.id, the one non-match record for A/type K1 doesn't get an assignment for b.id.

All of the other a.id/type combination had matching b.ids, thus they appear in the b.id column.

Then, since he is using the count function on distinct b.id/type values, the missing value and duplicates dont get counted.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1270 views
  • 7 likes
  • 4 in conversation