My data structure is like this
ID TYPE
A K1
A K2
A K3
B K3
C K2
C K3
D K2
I want to match, for each unique ID, how many other unique ID's that have the same type. For example, the output for this data would be
ID NUMBEROFID
A 3 (A contains K1,K2,K3. Nobody else is K1 so we don't need to consider K1.
Because there are C and D,except A, that are K2. There are B and C,except A,
that are K3 => the unique members, hence, are=> B C D, return 3 )
B 2 (Because B has only one type K3, so the only member that has K3 except B are => A and C. So return 2 )
C 2 (C contains K2 and K3. The member that has K2, except C, are A,D. The member that has K3,
except C, are A,B => Member are A,B,D, return 3)
D 2 (Same reasoning => A,C, return 2)
Please help! Thank you in Advance
You can do this with SQL by joining the table with itself.
data have ;
input (id type) ($) @@;
cards;
A K1 A K2 A K3 B K3 C K2 C K3 D K2
;;;;
proc sql noprint ;
create table want as
select a.id,count(distinct b.id) as numberofid
from have a left join have b
on a.type=b.type and a.id ^= b.id
group by a.id
;
quit;
id=A numberofid=3
id=B numberofid=2
id=C numberofid=3
id=D numberofid=2
You can do this with SQL by joining the table with itself.
data have ;
input (id type) ($) @@;
cards;
A K1 A K2 A K3 B K3 C K2 C K3 D K2
;;;;
proc sql noprint ;
create table want as
select a.id,count(distinct b.id) as numberofid
from have a left join have b
on a.type=b.type and a.id ^= b.id
group by a.id
;
quit;
id=A numberofid=3
id=B numberofid=2
id=C numberofid=3
id=D numberofid=2
Tom,
That's not what the OP was asking for, but close. I think the OP was asking for a similar join, but based on the transposed table:
data have ;
input (id type) ($) @@;
cards;
A K1 A K2 A K3 B K3 C K2 C K3 D K2
;;;;
proc transpose data=have
out=need (drop=_:);
var type;
id type;
by id;
run;
I'd be interested to see a sql solution that, given the above data, the correct count is achieved.
Art - Not sure what you mean. I got the numbers that match the english description. For id=C the number the OP listed 2, but if you read the text it should be 3.
- Tom
Tom, after reviewing your code and result I agree with you.
I had read the question as the calculation of a sort of group membership: how many poeple share how many characteristics with others in the group, as in the rephrasing:
data have ;
input (id type) ($) @@;
cards;
Paul Smokes
Paul OwnsACar
Paul PlaysGolf
Henry PlaysGolf
Susan OwnsACar
Susan PlaysGolf
George OwnsACar
;;;;
/* An interpretation which fits Tom's proposed solution */
proc sql noprint ;
create table want as
select a.id,count(distinct b.id) as fitsTheClub
from have a left join have b
on a.type=b.type and a.id ^= b.id
group by a.id
;
quit;
PG
Thank you very much guys! Sorry for my typo for C case. It should be 3 as Tom suggested.
Would it be possible for you to explain the code?
proc sql noprint ;
create table want as
select a.id,count(distinct b.id) as numberofid
from have a left join have b
on a.type=b.type and a.id ^= b.id
group by a.id
;
quit;
I would like to learn how you come up with these steps.
Tom's code is fairly easy to understand if you add in some of the extra variables that he didn't include in the file. e.g., if you run:
proc sql noprint ;
create table want as
select a.id,count(distinct b.id) as numberofid
,a.type,b.id as b_id
from have a left join have b
on a.type=b.type and a.id ^= b.id
group by a.id
;
quit;
you would get the following file:
id numberofid type b_id
A 3 K3 B
A 3 K2 C
A 3 K2 D
A 3 K3 C
A 3 K1
B 2 K3 C
B 2 K3 A
C 3 K2 A
C 3 K3 B
C 3 K2 D
C 3 K3 A
D 2 K2 A
D 2 K2 C
Tom used a left join so there MUST be at least one record for each ID/type combination in the original file. However, since he excluded matches where a.id = b.id, the one non-match record for A/type K1 doesn't get an assignment for b.id.
All of the other a.id/type combination had matching b.ids, thus they appear in the b.id column.
Then, since he is using the count function on distinct b.id/type values, the missing value and duplicates dont get counted.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.